KEMBAR78
Kubernetes internals (Kubernetes 해부하기) | PDF
Kubernetes Internals
Kubernetes 해부하기
eastbright.k@gmail.com
DongHyeon Kim
Agenda
‣ Understanding Kubernetes Components
‣ Understanding Networking
‣ Understanding Pod Networking
‣ Understanding Service Network
Understanding Kubernetes Components
Kubernetes Component
▸ Master Component
▸ 클러스터의 Control Plane을 제공
▸ Node Component
▸ Kubernetes Runtime 환경을 제공
▸ Add-on Component
▸ 부가적인 클러스터의 기능을 이행하는 Pod와 Service
Master Component
▸ kube-apiserver
▸ kubernetes api의 endpoint를 제공
▸ etcd
▸ 모든 클러스터의 데이터를 저장하는 Key-Value 저장소
▸ kube-scheduler
▸ Node가 배정되지 않은 Pod을 감지하고 해당 Pod가 구동 될 Node를 선택
▸ kube-controller-manager
▸ 다수의 Controller(Kubernetes의 Resource를 관리)를 실행
▸ cloud-controller-manager
▸ Cloud Provider와 상호작용
Node Component
▸ kubelet
▸ 클러스터의 각 호스트에서 실행되는 Agent
▸ kube-proxy
▸ Service의 추상화를 구현 (Userspace, iptables, ipvs,
kernelspace mode)
▸ Container-Runtime
▸ Container의 동작을 책임
▸ Container-Runtime-Interface를 구현한 모든 Runtime
Add-on Component
▸ DNS
▸ Service Discovery를 제공
▸ CNI (Container Network Interfaces)
▸ Pod 간의 Network를 제공
▸ Dashboard
▸ Monitoring
▸ Logging
컴포넌트의 상호 종속성
▸ 거의 항상 모든 컴포넌트는 API Server로 요청
▸ 일부 명령에 한해서만 API Server가 Kubelet에 요청
Understanding API Server
▸ 인증, 인가, 승인, 검증을 거쳐 etcd에 저장
▸ Resource의 변경 사항을 client들에게 전파
▸ Resource의 통지 및 저장하는 기능만 제공
Watch Interface of API Server
▸ Resource 별 watch Interface를 제공
▸ Notification(Publish / Subscribe) over Http
▸ Http 1.0, Http 1.1 지원
Watch Interface of API Server
$ curl --http1.0 http://localhost:8080/api/v1/pods?watch=true
$ tcpdump -nlA -i lo port 8080
05:42:32.087199 IP 127.0.0.1.47318 > 127.0.0.1.8080: Flags [P.], seq 1:101, ack 1, win 342, options [nop,nop,TS val 926521512
ecr 926521512], length 100: HTTP: GET /api/v1/pods?watch=true HTTP/1.0
E...9<@.@.."............rC.u...,...V.......
79..79..GET /api/v1/pods?watch=true HTTP/1.0
Host: localhost:8080
User-Agent: curl/7.58.0
Accept: */*
05:42:32.087785 IP 127.0.0.1.8080 > 127.0.0.1.47318: Flags [P.], seq 1:89, ack 101, win 342, options [nop,nop,TS val
926521513 ecr 926521512], length 88: HTTP: HTTP/1.0 200 OK
E...`c@.@..................,rC.....V.......
79..79..HTTP/1.0 200 OK
Content-Type: application/json
Date: Fri, 22 Mar 2019 05:42:32 GMT
05:42:32.090370 IP 127.0.0.1.8080 > 127.0.0.1.47318: Flags [P.], seq 56470:60566, ack 101, win 342, options [nop,nop,TS val
926521516 ecr 926521515], length 4096: HTTP
{"type":"ADDED","object":{ ... }}
...
Watch Interface of API Server
$ curl http://localhost:8080/api/v1/pods?watch=true
$ tcpdump -nlA -i lo port 8080
05:33:24.628863 IP 127.0.0.1.44242 > 127.0.0.1.8080: Flags [P.], seq 1:101, ack 1, win 342, options [nop,nop,TS val 925974024
ecr 925974024], length 100: HTTP: GET /api/v1/pods?watch=true HTTP/1.1
E....w@.@.o..............Q..jn.....V.......
71>.71>.GET /api/v1/pods?watch=true HTTP/1.1
Host: localhost:8080
User-Agent: curl/7.58.0
Accept: */*
05:33:24.629526 IP 127.0.0.1.8080 > 127.0.0.1.44242: Flags [P.], seq 1:117, ack 101, win 342, options [nop,nop,TS val
925974025 ecr 925974024], length 116: HTTP: HTTP/1.1 200 OK
E...;_@.@...............jn...Q.=...V.......
71> 71>.HTTP/1.1 200 OK
Content-Type: application/json
Date: Fri, 22 Mar 2019 05:33:24 GMT
Transfer-Encoding: chunked
9cf
{"type":"ADDED","object":{ ... }}
aab
{"type":"MODIFIED","object":{ ... }}
....
Understanding Scheduler
▸ Node가 할당되지 않은 Pod을 감지하여 Pod에 Node를 할당
▸ spec.nodeName (Pod.PodSpec.NodeName) 필드만 수정
▸ Pod이 Schedule될 수 있는 Node의 목록을 필터링
▸ 허용하는 Node 중 우선순위로 정렬한 뒤 최적의 Node를 선택
Scheduler의 기본적인 Filtering 정책
▸ Node가 Pod의 Request Resource 이상의 여분이 있는가?
▸ Node가 Pod의 NodeSelector에 맞는 Label을 가졌는가?
▸ Pod이 특정 Host Port Binding을 요구하는 경우 해당 Node에 Port가 이미 사용
중이지 않는가?
▸ Pod이 특정 Volume을 요청하는 경우, 이 Volume을 Node에서 제공할 수 있는가?
▸ Pod는 Node의 Taint를 허용하는가?
▸ …
▸ kubernetes/pkg/scheduler/core/generic_scheduler.go
▸ kubernetes/pkg/scheduler/algorithm/predicates/predicates.go
Understanding Controller
▸ Drive current state (status) → desired state (spec)
▸ Controller 간에는 통신 X
▸ Scheduler 와 통신 X
▸ Kubelet 과 통신 X
Replication Manager, ReplicaSet Controller
Endpoint Controller
Understanding Kubelet
▸ Worker Node에서 실행되는 모든 것의 책임을 가짐
▸ 초기 실행 시 Kubelet이 실행되는 Host를 Node Resource로 등록
▸ 해당 Node에 Schedule 된 Pod을 Container로 실행
▸ 실행 중인 Container를 지속적으로 모니터링하고 상태와 이벤트, 리
소스 소모를 API Server에 통지
▸ readiness, liveness probe를 실행
Understanding Kubelet
▸ 특정 Local Directory의 File 기반으로도 Pod 생성 가능
Components의 상호협력 방식
Understanding Kube-Proxy
▸ 모든 Node에서 Kube-Proxy가 실행 (daemonSet 으로 배포)
▸ Service의 추상화를 구현
▸ Userspace, iptables, ipvs, kernelspace Mode 지원
// cmd/kube-proxy/app/server.go
const (
proxyModeUserspace = "userspace"
proxyModeIPTables = "iptables"
proxyModeIPVS = "ipvs"
proxyModeKernelspace = “kernelspace" // for windows
)
Understanding Kube-Proxy
Proxy Mode (Userspace Mode)
kubernetes/pkg/proxy/userspace/proxier.go
Non-Proxy Mode (iptables, ipvs)
kubernetes/pkg/proxy/iptables/proxier.go
kubernetes/pkg/proxy/ipvs/proxier.go
Understanding DNS
▸ API Server의 watch interface를 통해 Service, Endpoint, Pod를 감시
▸ 최신의 DNS 정보를 유지
▸ Resource가 갱신 될 때 잠시동안 DNS Record가 유효하지 않을 수 있음
▸ Cluster에 배포되는 모든 Container 내부의 /etc/resolv.conf에
nameserver로 등록
▸ pkg/kubelet/network/dns/dns.go (SetupDNSinContainerizedMounter)
$ root@k8s-master:/home/h# kubectl exec -it sample cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.k8s svc.k8s k8s
options ndots:5
Understanding Networking
Networking Model
▸ Container to Container Networking
▸ namespace 공유 (localhost communication)
▸ Pod to Pod Networking
▸ CNI
▸ Pod to Service Networking
▸ Service
▸ External to Service Networking
▸ Service
Understanding Pod Networking
Requirements of CNI
▸ Node의 Pod는 모든 Node의 모든 Pod와 NAT 없이 통신이 가능해
야 함
▸ Node의 Agent는 해당 Node의 모든 Pod와 통신이 가능해야 함
▸ Node의 Host Network에서 실행되는 Pod는 모든 Node의 모든
Pod와 NAT 없이 통신이 가능해야 함
NAT-less?
Pod to Pod Networking (같은 노드)
Pod to Pod Networking (다른 노드)
Understanding Service Networking
Service Networking
▸ Service와 관련된 모든 것은 Kube-Proxy에 의해 처리
▸ Service는 고유한 IP와 Port를 가짐
▸ Service IP == Virtual IP
▸ Kube-Proxy는 Service 생성을 감지하면 Mode에 따른 Rule을 생성
▸ 목적지가 Service인 경우 목적지 주소를 Service에 연결 된 Pod 중
하나의 주소로 변경(DNAT)하여 Redirect
▸ Pod 바깥에서 Service로 접근하는 경우 SNAT (Node의 IP),
DNAT(Pod의 IP)가 모두 발생 (DSR을 지원하지 않는 경우)
iptables chain traversal
Service (iptables mode)
$ iptables -L -t nat
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
KUBE-SERVICES all -- anywhere anywhere /* kubernetes service portals */
Chain KUBE-SERVICES (2 references)
target prot opt source destination
KUBE-MARK-MASQ tcp -- !192.168.0.0/16 2.2.2.2 tcp dpt:http-alt
KUBE-SVC-ZE62HOGUXOIF3MJ5 tcp -- anywhere 2.2.2.2 tcp dpt:http-alt
KUBE-NODEPORTS all -- anywhere anywhere ADDRTYPE match dst-type LOCAL
Chain KUBE-NODEPORTS (1 references)
target prot opt source destination
KUBE-MARK-MASQ tcp -- anywhere anywhere tcp dpt:30001
KUBE-SVC-ZE62HOGUXOIF3MJ5 tcp -- anywhere anywhere tcp dpt:30001
Chain KUBE-SVC-ZE62HOGUXOIF3MJ5 (2 references)
target prot opt source destination
KUBE-SEP-7AE52TSMNDEGV6BO all -- anywhere anywhere statistic mode random probability 0.50000000000
KUBE-SEP-GEQ73U43LIPSQP2Z all -- anywhere anywhere
Chain KUBE-SEP-7AE52TSMNDEGV6BO (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 1.1.1.1 anywhere
DNAT tcp -- anywhere anywhere tcp to:1.1.1.1:8080
Chain KUBE-SEP-GEQ73U43LIPSQP2Z (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 1.1.2.1 anywhere
DNAT tcp -- anywhere anywhere tcp to:1.1.2.1:8080
Packet flow of Service
Node 1 Node 2
Pod A Pod B1 Pod B2 Pod B3
Service A
2.2.2.2:8080
30001
Service A
2.2.2.2:8080
30001
1.1.1.1 1.1.1.2 1.1.2.1 1.1.2.2
3.3.3.1 3.3.3.2
3.3.3.3
External
Packet flow of Service (External to Service)
$ tcpdump -i enp0s8 port 30001 -n
05:17:50.632656 IP 3.3.3.3.55824 > 3.3.3.1.30001: Flags [SEW], seq 920096640, win
65535, options [mss 1460,nop,wscale 6,nop,nop,TS val 449946274 ecr 0,sackOK,eol],
length 0
05:17:50.632886 IP 3.3.3.1.30001 > 3.3.3.3.55824: Flags [S.E], seq 2034560536, ack
920096641, win 28960, options [mss 1460,sackOK,TS val 167059923 ecr
449946274,nop,wscale 7], length 0
$ tcpdump -i cali27c81818b22 -n
05:17:50.632712 IP 3.3.3.1.55824 > 1.1.2.1.8080: Flags [SEW], seq 920096640, win
65535, options [mss 1460,nop,wscale 6,nop,nop,TS val 449946274 ecr 0,sackOK,eol],
length 0
05:17:50.632874 IP 1.1.2.1.8080 > 3.3.3.1.55824: Flags [S.E], seq 2034560536, ack
920096641, win 28960, options [mss 1460,sackOK,TS val 167059923 ecr
449946274,nop,wscale 7], length 0
Node1 Interface
Pod B2 Interface
Packet flow of Service (External to Service)
Node 1 Node 2
Pod A Pod B1 Pod B2 Pod B3
Service A
2.2.2.2:8080
30001
Service A
2.2.2.2:8080
30001
1.1.1.1 1.1.1.2 1.1.2.1 1.1.2.2

src: 3.3.3.3, dst: 3.3.3.1:30001
3.3.3.1 3.3.3.2
3.3.3.3
External
Packet flow of Service (External to Service)
Node 1 Node 2
Pod A Pod B1 Pod B2 Pod B3
Service A
2.2.2.2:8080
30001
Service A
2.2.2.2:8080
30001
1.1.1.1 1.1.1.2 1.1.2.1 1.1.2.2

src: 3.3.3.1, dst: 1.1.2.1:8080
3.3.3.1 3.3.3.2
3.3.3.3
External
Packet flow of Service (External to Service)
Node 1 Node 2
Pod A Pod B1 Pod B2 Pod B3
Service A
2.2.2.2:8080
30001
Service A
2.2.2.2:8080
30001
1.1.1.1 1.1.1.2 1.1.2.1 1.1.2.2

src: 1.1.2.1:8080, dst:3.3.3.1
3.3.3.1 3.3.3.2
3.3.3.3
External
Packet flow of Service (External to Service)
Node 1 Node 2
Pod A Pod B1 Pod B2 Pod B3
Service A
2.2.2.2:8080
30001
Service A
2.2.2.2:8080
30001
1.1.1.1 1.1.1.2 1.1.2.1 1.1.2.2

src: 3.3.3.1:30001, dst: 3.3.3.3
3.3.3.1 3.3.3.2
3.3.3.3
External
Packet flow of Service (Pod to Service)
$ tcpdump -i calib43f921251f -n
05:14:05.077057 IP 1.1.1.1.54122 > 2.2.2.2.8080: Flags [S], seq 1210630612, win
29200, options [mss 1460,sackOK,TS val 2710881183 ecr 0,nop,wscale 7], length 0
05:14:05.077767 IP 2.2.2.2.8080 > 1.1.1.1.54122: Flags [S.], seq 4123667957, ack
1210630613, win 28960, options [mss 1460,sackOK,TS val 411294588 ecr
2710881183,nop,wscale 7], length 0
$ tcpdump -i cali27c81818b22 -n
05:14:05.099668 IP 1.1.1.1.54122 > 1.1.2.1.8080: Flags [S], seq 1210630612, win
29200, options [mss 1460,sackOK,TS val 2710881183 ecr 0,nop,wscale 7], length 0
05:14:05.099826 IP 1.1.2.1.8080 > 1.1.1.1.54122: Flags [S.], seq 4123667957, ack
1210630613, win 28960, options [mss 1460,sackOK,TS val 411294588 ecr
2710881183,nop,wscale 7], length 0
Pod B2 Interface
Pod A Interface
Packet flow of Service (Pod to Service)
Node 1 Node 2
Pod A Pod B1 Pod B2 Pod B3
Service A
2.2.2.2:8080
30001
Service A
2.2.2.2:8080
30001
1.1.1.1 1.1.1.2 1.1.2.1 1.1.2.2

src: 1.1.1.1, dst: 2.2.2.2:8080
3.3.3.1 3.3.3.2
Packet flow of Service (Pod to Service)
Node 1 Node 2
Pod A Pod B1 Pod B2 Pod B3
Service A
2.2.2.2:8080
30001
Service A
2.2.2.2:8080
30001
1.1.1.1 1.1.1.2 1.1.2.1 1.1.2.2

src: 1.1.1.1, dst: 1.1.2.1:8080
3.3.3.1 3.3.3.2
Packet flow of Service (Pod to Service)
Node 1 Node 2
Pod A Pod B1 Pod B2 Pod B3
Service A
2.2.2.2:8080
30001
Service A
2.2.2.2:8080
30001
1.1.1.1 1.1.1.2 1.1.2.1 1.1.2.2

src: 1.1.2.1:8080, dst:1.1.1.1
3.3.3.1 3.3.3.2
Packet flow of Service (Pod to Service)
Node 1 Node 2
Pod A Pod B1 Pod B2 Pod B3
Service A
2.2.2.2:8080
30001
Service A
2.2.2.2:8080
30001
1.1.1.1 1.1.1.2 1.1.2.1 1.1.2.2

src: 2.2.2.2:8080, dst:1.1.1.1
3.3.3.1 3.3.3.2
Packet flow of Service (Pod to self-Service)
$ tcpdump -i calib43f921251f -n
05:15:59.556723 IP 1.1.2.1.54308 > 2.2.2.2.8080: Flags [S], seq 4048875942, win
29200, options [mss 1460,sackOK,TS val 2710995663 ecr 0,nop,wscale 7], length 0
05:15:59.556770 IP 3.3.3.2.54308 > 1.1.2.1.8080: Flags [S], seq 4048875942, win
29200, options [mss 1460,sackOK,TS val 2710995663 ecr 0,nop,wscale 7], length 0
05:15:59.556779 IP 1.1.2.1.8080 > 3.3.3.2.54308: Flags [S.], seq 2680204874, ack
4048875943, win 28960, options [mss 1460,sackOK,TS val 1749589035 ecr
2710995663,nop,wscale 7], length 0
05:15:59.556785 IP 2.2.2.2.8080 > 1.1.2.1.54308: Flags [S.], seq 2680204874, ack
4048875943, win 28960, options [mss 1460,sackOK,TS val 1749589035 ecr
2710995663,nop,wscale 7], length 0
Pod B2 Interface
Packet flow of Service (Pod to self-Service)
Node 1 Node 2
Pod A Pod B1 Pod B2 Pod B3
Service A
2.2.2.2:8080
30001
Service A
2.2.2.2:8080
30001
1.1.1.1 1.1.1.2 1.1.2.1 1.1.2.2

src: 1.1.2.1, dst: 2.2.2.2:8080
3.3.3.1 3.3.3.2
Packet flow of Service (Pod to self-Service)
Node 1 Node 2
Pod A Pod B1 Pod B2 Pod B3
Service A
2.2.2.2:8080
30001
Service A
2.2.2.2:8080
30001
1.1.1.1 1.1.1.2 1.1.2.1 1.1.2.2

src: 3.3.3.2, dst: 1.1.2.1:8080
3.3.3.1 3.3.3.2
Packet flow of Service (Pod to self-Service)
Node 1 Node 2
Pod A Pod B1 Pod B2 Pod B3
Service A
2.2.2.2:8080
30001
Service A
2.2.2.2:8080
30001
1.1.1.1 1.1.1.2 1.1.2.1 1.1.2.2

src: 1.1.2.1:8080, dst:3.3.3.2
3.3.3.1 3.3.3.2
Packet flow of Service (Pod to self-Service)
Node 1 Node 2
Pod A Pod B1 Pod B2 Pod B3
Service A
2.2.2.2:8080
30001
Service A
2.2.2.2:8080
30001
1.1.1.1 1.1.1.2 1.1.2.1 1.1.2.2

src: 2.2.2.2:8080, dst: 1.1.2.1
3.3.3.1 3.3.3.2
Q & A
References
▸ https://livebook.manning.com/#!/book/kubernetes-in-
action/chapter-11
▸ https://github.com/kubernetes/community
▸ http://ebtables.netfilter.org/br_fw_ia/br_fw_ia.html
▸ https://github.com/inaz1502/kubernetes-internals
▸ https://github.com/sillim-programmer/kubernetes-in-
action-study/tree/master/k8s-in-action-chap11

Kubernetes internals (Kubernetes 해부하기)

  • 1.
  • 2.
    Agenda ‣ Understanding KubernetesComponents ‣ Understanding Networking ‣ Understanding Pod Networking ‣ Understanding Service Network
  • 3.
  • 4.
    Kubernetes Component ▸ MasterComponent ▸ 클러스터의 Control Plane을 제공 ▸ Node Component ▸ Kubernetes Runtime 환경을 제공 ▸ Add-on Component ▸ 부가적인 클러스터의 기능을 이행하는 Pod와 Service
  • 5.
    Master Component ▸ kube-apiserver ▸kubernetes api의 endpoint를 제공 ▸ etcd ▸ 모든 클러스터의 데이터를 저장하는 Key-Value 저장소 ▸ kube-scheduler ▸ Node가 배정되지 않은 Pod을 감지하고 해당 Pod가 구동 될 Node를 선택 ▸ kube-controller-manager ▸ 다수의 Controller(Kubernetes의 Resource를 관리)를 실행 ▸ cloud-controller-manager ▸ Cloud Provider와 상호작용
  • 6.
    Node Component ▸ kubelet ▸클러스터의 각 호스트에서 실행되는 Agent ▸ kube-proxy ▸ Service의 추상화를 구현 (Userspace, iptables, ipvs, kernelspace mode) ▸ Container-Runtime ▸ Container의 동작을 책임 ▸ Container-Runtime-Interface를 구현한 모든 Runtime
  • 7.
    Add-on Component ▸ DNS ▸Service Discovery를 제공 ▸ CNI (Container Network Interfaces) ▸ Pod 간의 Network를 제공 ▸ Dashboard ▸ Monitoring ▸ Logging
  • 8.
    컴포넌트의 상호 종속성 ▸거의 항상 모든 컴포넌트는 API Server로 요청 ▸ 일부 명령에 한해서만 API Server가 Kubelet에 요청
  • 9.
    Understanding API Server ▸인증, 인가, 승인, 검증을 거쳐 etcd에 저장 ▸ Resource의 변경 사항을 client들에게 전파 ▸ Resource의 통지 및 저장하는 기능만 제공
  • 10.
    Watch Interface ofAPI Server ▸ Resource 별 watch Interface를 제공 ▸ Notification(Publish / Subscribe) over Http ▸ Http 1.0, Http 1.1 지원
  • 11.
    Watch Interface ofAPI Server $ curl --http1.0 http://localhost:8080/api/v1/pods?watch=true $ tcpdump -nlA -i lo port 8080 05:42:32.087199 IP 127.0.0.1.47318 > 127.0.0.1.8080: Flags [P.], seq 1:101, ack 1, win 342, options [nop,nop,TS val 926521512 ecr 926521512], length 100: HTTP: GET /api/v1/pods?watch=true HTTP/1.0 E...9<@.@.."............rC.u...,...V....... 79..79..GET /api/v1/pods?watch=true HTTP/1.0 Host: localhost:8080 User-Agent: curl/7.58.0 Accept: */* 05:42:32.087785 IP 127.0.0.1.8080 > 127.0.0.1.47318: Flags [P.], seq 1:89, ack 101, win 342, options [nop,nop,TS val 926521513 ecr 926521512], length 88: HTTP: HTTP/1.0 200 OK E...`c@.@..................,rC.....V....... 79..79..HTTP/1.0 200 OK Content-Type: application/json Date: Fri, 22 Mar 2019 05:42:32 GMT 05:42:32.090370 IP 127.0.0.1.8080 > 127.0.0.1.47318: Flags [P.], seq 56470:60566, ack 101, win 342, options [nop,nop,TS val 926521516 ecr 926521515], length 4096: HTTP {"type":"ADDED","object":{ ... }} ...
  • 12.
    Watch Interface ofAPI Server $ curl http://localhost:8080/api/v1/pods?watch=true $ tcpdump -nlA -i lo port 8080 05:33:24.628863 IP 127.0.0.1.44242 > 127.0.0.1.8080: Flags [P.], seq 1:101, ack 1, win 342, options [nop,nop,TS val 925974024 ecr 925974024], length 100: HTTP: GET /api/v1/pods?watch=true HTTP/1.1 E....w@.@.o..............Q..jn.....V....... 71>.71>.GET /api/v1/pods?watch=true HTTP/1.1 Host: localhost:8080 User-Agent: curl/7.58.0 Accept: */* 05:33:24.629526 IP 127.0.0.1.8080 > 127.0.0.1.44242: Flags [P.], seq 1:117, ack 101, win 342, options [nop,nop,TS val 925974025 ecr 925974024], length 116: HTTP: HTTP/1.1 200 OK E...;_@.@...............jn...Q.=...V....... 71> 71>.HTTP/1.1 200 OK Content-Type: application/json Date: Fri, 22 Mar 2019 05:33:24 GMT Transfer-Encoding: chunked 9cf {"type":"ADDED","object":{ ... }} aab {"type":"MODIFIED","object":{ ... }} ....
  • 13.
    Understanding Scheduler ▸ Node가할당되지 않은 Pod을 감지하여 Pod에 Node를 할당 ▸ spec.nodeName (Pod.PodSpec.NodeName) 필드만 수정 ▸ Pod이 Schedule될 수 있는 Node의 목록을 필터링 ▸ 허용하는 Node 중 우선순위로 정렬한 뒤 최적의 Node를 선택
  • 14.
    Scheduler의 기본적인 Filtering정책 ▸ Node가 Pod의 Request Resource 이상의 여분이 있는가? ▸ Node가 Pod의 NodeSelector에 맞는 Label을 가졌는가? ▸ Pod이 특정 Host Port Binding을 요구하는 경우 해당 Node에 Port가 이미 사용 중이지 않는가? ▸ Pod이 특정 Volume을 요청하는 경우, 이 Volume을 Node에서 제공할 수 있는가? ▸ Pod는 Node의 Taint를 허용하는가? ▸ … ▸ kubernetes/pkg/scheduler/core/generic_scheduler.go ▸ kubernetes/pkg/scheduler/algorithm/predicates/predicates.go
  • 15.
    Understanding Controller ▸ Drivecurrent state (status) → desired state (spec) ▸ Controller 간에는 통신 X ▸ Scheduler 와 통신 X ▸ Kubelet 과 통신 X
  • 16.
  • 17.
  • 18.
    Understanding Kubelet ▸ WorkerNode에서 실행되는 모든 것의 책임을 가짐 ▸ 초기 실행 시 Kubelet이 실행되는 Host를 Node Resource로 등록 ▸ 해당 Node에 Schedule 된 Pod을 Container로 실행 ▸ 실행 중인 Container를 지속적으로 모니터링하고 상태와 이벤트, 리 소스 소모를 API Server에 통지 ▸ readiness, liveness probe를 실행
  • 19.
    Understanding Kubelet ▸ 특정Local Directory의 File 기반으로도 Pod 생성 가능
  • 20.
  • 21.
    Understanding Kube-Proxy ▸ 모든Node에서 Kube-Proxy가 실행 (daemonSet 으로 배포) ▸ Service의 추상화를 구현 ▸ Userspace, iptables, ipvs, kernelspace Mode 지원 // cmd/kube-proxy/app/server.go const ( proxyModeUserspace = "userspace" proxyModeIPTables = "iptables" proxyModeIPVS = "ipvs" proxyModeKernelspace = “kernelspace" // for windows )
  • 22.
    Understanding Kube-Proxy Proxy Mode(Userspace Mode) kubernetes/pkg/proxy/userspace/proxier.go Non-Proxy Mode (iptables, ipvs) kubernetes/pkg/proxy/iptables/proxier.go kubernetes/pkg/proxy/ipvs/proxier.go
  • 23.
    Understanding DNS ▸ APIServer의 watch interface를 통해 Service, Endpoint, Pod를 감시 ▸ 최신의 DNS 정보를 유지 ▸ Resource가 갱신 될 때 잠시동안 DNS Record가 유효하지 않을 수 있음 ▸ Cluster에 배포되는 모든 Container 내부의 /etc/resolv.conf에 nameserver로 등록 ▸ pkg/kubelet/network/dns/dns.go (SetupDNSinContainerizedMounter) $ root@k8s-master:/home/h# kubectl exec -it sample cat /etc/resolv.conf nameserver 10.96.0.10 search default.svc.k8s svc.k8s k8s options ndots:5
  • 24.
  • 25.
    Networking Model ▸ Containerto Container Networking ▸ namespace 공유 (localhost communication) ▸ Pod to Pod Networking ▸ CNI ▸ Pod to Service Networking ▸ Service ▸ External to Service Networking ▸ Service
  • 26.
  • 27.
    Requirements of CNI ▸Node의 Pod는 모든 Node의 모든 Pod와 NAT 없이 통신이 가능해 야 함 ▸ Node의 Agent는 해당 Node의 모든 Pod와 통신이 가능해야 함 ▸ Node의 Host Network에서 실행되는 Pod는 모든 Node의 모든 Pod와 NAT 없이 통신이 가능해야 함
  • 28.
  • 29.
    Pod to PodNetworking (같은 노드)
  • 30.
    Pod to PodNetworking (다른 노드)
  • 31.
  • 32.
    Service Networking ▸ Service와관련된 모든 것은 Kube-Proxy에 의해 처리 ▸ Service는 고유한 IP와 Port를 가짐 ▸ Service IP == Virtual IP ▸ Kube-Proxy는 Service 생성을 감지하면 Mode에 따른 Rule을 생성 ▸ 목적지가 Service인 경우 목적지 주소를 Service에 연결 된 Pod 중 하나의 주소로 변경(DNAT)하여 Redirect ▸ Pod 바깥에서 Service로 접근하는 경우 SNAT (Node의 IP), DNAT(Pod의 IP)가 모두 발생 (DSR을 지원하지 않는 경우)
  • 33.
  • 34.
    Service (iptables mode) $iptables -L -t nat Chain PREROUTING (policy ACCEPT) target prot opt source destination KUBE-SERVICES all -- anywhere anywhere /* kubernetes service portals */ Chain KUBE-SERVICES (2 references) target prot opt source destination KUBE-MARK-MASQ tcp -- !192.168.0.0/16 2.2.2.2 tcp dpt:http-alt KUBE-SVC-ZE62HOGUXOIF3MJ5 tcp -- anywhere 2.2.2.2 tcp dpt:http-alt KUBE-NODEPORTS all -- anywhere anywhere ADDRTYPE match dst-type LOCAL Chain KUBE-NODEPORTS (1 references) target prot opt source destination KUBE-MARK-MASQ tcp -- anywhere anywhere tcp dpt:30001 KUBE-SVC-ZE62HOGUXOIF3MJ5 tcp -- anywhere anywhere tcp dpt:30001 Chain KUBE-SVC-ZE62HOGUXOIF3MJ5 (2 references) target prot opt source destination KUBE-SEP-7AE52TSMNDEGV6BO all -- anywhere anywhere statistic mode random probability 0.50000000000 KUBE-SEP-GEQ73U43LIPSQP2Z all -- anywhere anywhere Chain KUBE-SEP-7AE52TSMNDEGV6BO (1 references) target prot opt source destination KUBE-MARK-MASQ all -- 1.1.1.1 anywhere DNAT tcp -- anywhere anywhere tcp to:1.1.1.1:8080 Chain KUBE-SEP-GEQ73U43LIPSQP2Z (1 references) target prot opt source destination KUBE-MARK-MASQ all -- 1.1.2.1 anywhere DNAT tcp -- anywhere anywhere tcp to:1.1.2.1:8080
  • 35.
    Packet flow ofService Node 1 Node 2 Pod A Pod B1 Pod B2 Pod B3 Service A 2.2.2.2:8080 30001 Service A 2.2.2.2:8080 30001 1.1.1.1 1.1.1.2 1.1.2.1 1.1.2.2 3.3.3.1 3.3.3.2 3.3.3.3 External
  • 36.
    Packet flow ofService (External to Service) $ tcpdump -i enp0s8 port 30001 -n 05:17:50.632656 IP 3.3.3.3.55824 > 3.3.3.1.30001: Flags [SEW], seq 920096640, win 65535, options [mss 1460,nop,wscale 6,nop,nop,TS val 449946274 ecr 0,sackOK,eol], length 0 05:17:50.632886 IP 3.3.3.1.30001 > 3.3.3.3.55824: Flags [S.E], seq 2034560536, ack 920096641, win 28960, options [mss 1460,sackOK,TS val 167059923 ecr 449946274,nop,wscale 7], length 0 $ tcpdump -i cali27c81818b22 -n 05:17:50.632712 IP 3.3.3.1.55824 > 1.1.2.1.8080: Flags [SEW], seq 920096640, win 65535, options [mss 1460,nop,wscale 6,nop,nop,TS val 449946274 ecr 0,sackOK,eol], length 0 05:17:50.632874 IP 1.1.2.1.8080 > 3.3.3.1.55824: Flags [S.E], seq 2034560536, ack 920096641, win 28960, options [mss 1460,sackOK,TS val 167059923 ecr 449946274,nop,wscale 7], length 0 Node1 Interface Pod B2 Interface
  • 37.
    Packet flow ofService (External to Service) Node 1 Node 2 Pod A Pod B1 Pod B2 Pod B3 Service A 2.2.2.2:8080 30001 Service A 2.2.2.2:8080 30001 1.1.1.1 1.1.1.2 1.1.2.1 1.1.2.2 
src: 3.3.3.3, dst: 3.3.3.1:30001 3.3.3.1 3.3.3.2 3.3.3.3 External
  • 38.
    Packet flow ofService (External to Service) Node 1 Node 2 Pod A Pod B1 Pod B2 Pod B3 Service A 2.2.2.2:8080 30001 Service A 2.2.2.2:8080 30001 1.1.1.1 1.1.1.2 1.1.2.1 1.1.2.2 
src: 3.3.3.1, dst: 1.1.2.1:8080 3.3.3.1 3.3.3.2 3.3.3.3 External
  • 39.
    Packet flow ofService (External to Service) Node 1 Node 2 Pod A Pod B1 Pod B2 Pod B3 Service A 2.2.2.2:8080 30001 Service A 2.2.2.2:8080 30001 1.1.1.1 1.1.1.2 1.1.2.1 1.1.2.2 
src: 1.1.2.1:8080, dst:3.3.3.1 3.3.3.1 3.3.3.2 3.3.3.3 External
  • 40.
    Packet flow ofService (External to Service) Node 1 Node 2 Pod A Pod B1 Pod B2 Pod B3 Service A 2.2.2.2:8080 30001 Service A 2.2.2.2:8080 30001 1.1.1.1 1.1.1.2 1.1.2.1 1.1.2.2 
src: 3.3.3.1:30001, dst: 3.3.3.3 3.3.3.1 3.3.3.2 3.3.3.3 External
  • 41.
    Packet flow ofService (Pod to Service) $ tcpdump -i calib43f921251f -n 05:14:05.077057 IP 1.1.1.1.54122 > 2.2.2.2.8080: Flags [S], seq 1210630612, win 29200, options [mss 1460,sackOK,TS val 2710881183 ecr 0,nop,wscale 7], length 0 05:14:05.077767 IP 2.2.2.2.8080 > 1.1.1.1.54122: Flags [S.], seq 4123667957, ack 1210630613, win 28960, options [mss 1460,sackOK,TS val 411294588 ecr 2710881183,nop,wscale 7], length 0 $ tcpdump -i cali27c81818b22 -n 05:14:05.099668 IP 1.1.1.1.54122 > 1.1.2.1.8080: Flags [S], seq 1210630612, win 29200, options [mss 1460,sackOK,TS val 2710881183 ecr 0,nop,wscale 7], length 0 05:14:05.099826 IP 1.1.2.1.8080 > 1.1.1.1.54122: Flags [S.], seq 4123667957, ack 1210630613, win 28960, options [mss 1460,sackOK,TS val 411294588 ecr 2710881183,nop,wscale 7], length 0 Pod B2 Interface Pod A Interface
  • 42.
    Packet flow ofService (Pod to Service) Node 1 Node 2 Pod A Pod B1 Pod B2 Pod B3 Service A 2.2.2.2:8080 30001 Service A 2.2.2.2:8080 30001 1.1.1.1 1.1.1.2 1.1.2.1 1.1.2.2 
src: 1.1.1.1, dst: 2.2.2.2:8080 3.3.3.1 3.3.3.2
  • 43.
    Packet flow ofService (Pod to Service) Node 1 Node 2 Pod A Pod B1 Pod B2 Pod B3 Service A 2.2.2.2:8080 30001 Service A 2.2.2.2:8080 30001 1.1.1.1 1.1.1.2 1.1.2.1 1.1.2.2 
src: 1.1.1.1, dst: 1.1.2.1:8080 3.3.3.1 3.3.3.2
  • 44.
    Packet flow ofService (Pod to Service) Node 1 Node 2 Pod A Pod B1 Pod B2 Pod B3 Service A 2.2.2.2:8080 30001 Service A 2.2.2.2:8080 30001 1.1.1.1 1.1.1.2 1.1.2.1 1.1.2.2 
src: 1.1.2.1:8080, dst:1.1.1.1 3.3.3.1 3.3.3.2
  • 45.
    Packet flow ofService (Pod to Service) Node 1 Node 2 Pod A Pod B1 Pod B2 Pod B3 Service A 2.2.2.2:8080 30001 Service A 2.2.2.2:8080 30001 1.1.1.1 1.1.1.2 1.1.2.1 1.1.2.2 
src: 2.2.2.2:8080, dst:1.1.1.1 3.3.3.1 3.3.3.2
  • 46.
    Packet flow ofService (Pod to self-Service) $ tcpdump -i calib43f921251f -n 05:15:59.556723 IP 1.1.2.1.54308 > 2.2.2.2.8080: Flags [S], seq 4048875942, win 29200, options [mss 1460,sackOK,TS val 2710995663 ecr 0,nop,wscale 7], length 0 05:15:59.556770 IP 3.3.3.2.54308 > 1.1.2.1.8080: Flags [S], seq 4048875942, win 29200, options [mss 1460,sackOK,TS val 2710995663 ecr 0,nop,wscale 7], length 0 05:15:59.556779 IP 1.1.2.1.8080 > 3.3.3.2.54308: Flags [S.], seq 2680204874, ack 4048875943, win 28960, options [mss 1460,sackOK,TS val 1749589035 ecr 2710995663,nop,wscale 7], length 0 05:15:59.556785 IP 2.2.2.2.8080 > 1.1.2.1.54308: Flags [S.], seq 2680204874, ack 4048875943, win 28960, options [mss 1460,sackOK,TS val 1749589035 ecr 2710995663,nop,wscale 7], length 0 Pod B2 Interface
  • 47.
    Packet flow ofService (Pod to self-Service) Node 1 Node 2 Pod A Pod B1 Pod B2 Pod B3 Service A 2.2.2.2:8080 30001 Service A 2.2.2.2:8080 30001 1.1.1.1 1.1.1.2 1.1.2.1 1.1.2.2 
src: 1.1.2.1, dst: 2.2.2.2:8080 3.3.3.1 3.3.3.2
  • 48.
    Packet flow ofService (Pod to self-Service) Node 1 Node 2 Pod A Pod B1 Pod B2 Pod B3 Service A 2.2.2.2:8080 30001 Service A 2.2.2.2:8080 30001 1.1.1.1 1.1.1.2 1.1.2.1 1.1.2.2 
src: 3.3.3.2, dst: 1.1.2.1:8080 3.3.3.1 3.3.3.2
  • 49.
    Packet flow ofService (Pod to self-Service) Node 1 Node 2 Pod A Pod B1 Pod B2 Pod B3 Service A 2.2.2.2:8080 30001 Service A 2.2.2.2:8080 30001 1.1.1.1 1.1.1.2 1.1.2.1 1.1.2.2 
src: 1.1.2.1:8080, dst:3.3.3.2 3.3.3.1 3.3.3.2
  • 50.
    Packet flow ofService (Pod to self-Service) Node 1 Node 2 Pod A Pod B1 Pod B2 Pod B3 Service A 2.2.2.2:8080 30001 Service A 2.2.2.2:8080 30001 1.1.1.1 1.1.1.2 1.1.2.1 1.1.2.2 
src: 2.2.2.2:8080, dst: 1.1.2.1 3.3.3.1 3.3.3.2
  • 51.
  • 52.
    References ▸ https://livebook.manning.com/#!/book/kubernetes-in- action/chapter-11 ▸ https://github.com/kubernetes/community ▸http://ebtables.netfilter.org/br_fw_ia/br_fw_ia.html ▸ https://github.com/inaz1502/kubernetes-internals ▸ https://github.com/sillim-programmer/kubernetes-in- action-study/tree/master/k8s-in-action-chap11