KEMBAR78
Automated container-deployment-on-kubernetes | PPTX
Automated Container
Deployment on Kubernetes
An intro to kubernetes
David Chang
Linkernetworks
BackEnd, DevOps,
Docker, Kubernetes
dchang@linkernetworks.com
Outline
• What is Kubernetes
• Deploy a containerized app
• Deploy a app to Kubernetes. Pros & Cons
• Real world cases
What is Kubernetes
Kubernetes is an open-source system for
automating deployment, scaling, and
management of containerized applications
https://kubernetes.io/
Why containerize
Docker - Build, Ship, and Run Any App, Anywhere
Docker is an open platform for developers and
sysadmins to build, ship, and run distributed
applications, whether on laptops, data center VMs, or
the cloud.
https://www.docker.com/
Kubernetes Cluster
Let’s use Kubernetes
Node 2
10.1.15.2
Node 1
10.1.15.1
Node 3
10.1.15.3
Pod 1
(App)
We want’s to deploy our app
Kubernetes Cluster
Deploy an app
Node 2
10.1.15.2
Node 1
10.1.15.1
Node 3
10.1.15.3
Pod 1
(App)
We want our app. We don’t really care where it is.
Kubernetes Cluster
Access an app
Node 2
10.1.15.2
Node 1
10.1.15.1
Node 3
10.1.15.3
Service (App Endpoint)
Pod 1
(App)
Cluster-wide endpoint by service
Kubernetes Cluster
Scale an app
Node 2
10.1.15.2
Node 1
10.1.15.1
Node 3
10.1.15.3
Service (App Endpoint)
Pod 1
(App)
Pod 2
(App)
K8s find a Node to run second pod
Kubernetes Cluster
Health check an app
Node 2
10.1.15.2
Node 1
10.1.15.1
Node 3
10.1.15.3
Service (App Endpoint)
Pod 1
(App)
Pod 2
(App)
Your service still online with zero downtime
Kubernetes Cluster
Self-heal an app
Node 2
10.1.15.2
Node 1
10.1.15.1
Node 3
10.1.15.3
Service (App Endpoint)
Pod
(App)
Pod 2
(App)
Your service still online with zero downtime
Pod 3
(App)
Kubernetes Cluster
Self-heal an app
Node 2
10.1.15.2
Node 1
10.1.15.1
Node 3
10.1.15.3
Service (App Endpoint)
Pod 4
(App)
Pod 2
(App)
Your service still online with zero downtime
Keywords
• Clustering, distributed
• Automation deployment
• Auto scaling, Load balancing
• Health check, self-healing, zero downtime
Kubernetes Cluster
Resource Management
Node 2
10.1.15.2
Node 1
10.1.15.1
Node 3
10.1.15.3
GPU
GPU
Storage
Job 1
(GPU)
Job 2
(GPU)
One time job. Deploy, execute, save results.
GPU Storage
Kubernetes Cluster
Request Resource
Node 2
10.1.15.2
Node 1
10.1.15.1
Node 3
10.1.15.3
GPU Storage
Job 1
(GPU)
Job 2
(GPU)
Share loading between nodes
GPU GPU
Storage
Kubernetes Cluster
Request Resource
Node 2
10.1.15.2
Node 1
10.1.15.1
Node 3
10.1.15.3
GPU Storage
Job 1
(GPU)
Job 2
(GPU,
Heavy disk io)
Share loading between nodes
GPU
GPUStorage
Kubernetes Cluster
Schedule Policy
Node 2
10.1.15.2
Node 1
10.1.15.1
Node 3
10.1.15.3
Storage
Job 1
(GPU)
Job 2
(GPU)
GPU
GPU
Storage
GPU
Node 1 is optimized (high CPU, high disk IO…)
Kubernetes Cluster
Queue
Node 2
10.1.15.2
Node 1
10.1.15.1
Node 3
10.1.15.3
GPU Storage
Job 1
(GPU)
Job 2
(GPU)
GPU GPU
Storage
Job 3
(GPU)
Job 4
(GPU)
K8s try to meet resource request
Kubernetes Cluster
Queue
Node 2
10.1.15.2
Node 1
10.1.15.1
Node 3
10.1.15.3
Storage
Job 1
(GPU)
Job 2
(GPU)
GPU GPU
Storage
Job 3
(GPU)
Job 4
(GPU)
K8s try to meet resource request
GPU
Kubernetes Cluster
Resources
Node 2
10.1.15.2
Node 1
10.1.15.1
Node 3
10.1.15.3
Job 2
(GPU)
GPU
Storage
Job 6
(GPU)
Job 5
GPU
CPU
Memory
CPUCPU
Storage
GPU
Keywords
• Resource request, availability check, auto scheduling
• Schedule policy, affinity, anti-affinity
• Job queue
• Label selector
Data Processing Job
• Tons of data, heavy IO
• GPU/CPU loading
• Storage elasticity
Kubernetes Cluster
Storage
System
Storage
System
GPU
Computing
GPU
Computing
CPU serverCPU server GPU
Servers
CPU servers Storage
System
GPUGPUCPUCPUCPU GPU
Network Storage System Endpoint
Storage
Data Processing Job
Pre-processing
Model-training
Inference
Public Cloud
• Google GKE, AWS EKS, Azure AKS…
• Auto scale servers (node pool)
• High integration with cloud services
• It won’t easily die
• Expensive GPU
Bare metal servers
• Cheap (compare to cloud platform)
• Highly customizable, optimizable
• Embrace many system / networking / infrastructure issues
Thank you
• Automating deployment, scaling, containerize
• Build, deploy, distributed apps, anywhere
• Resource management, job queue, scheduling policy

Automated container-deployment-on-kubernetes

Editor's Notes

  • #4 Share great tools for DevOps, deployment Sounds like a sale I saw tremendous change within my develop team. Hard to maintain but really good when using it There are less ‘how’ in this presentation
  • #5 If your issue fit those key words, try k8s 2014 released by google, open source, maintained by community, lead by google Easy to achieve automation deployment management Containerize
  • #6 Docker released 2013 Containerize: build, ship, run any app, anywhere Widely used by public cloud/services like GCP Uni-interface upon OS -> App like a tree root on your server. Your engineer don’t want to move it.(bind port, bind dir, bind library, bind kernel, bind os distribution) App as replaceable part Easy to build, deploy, scale, monitor, heal, recover
  • #7 Learning by using Distributed system Let’s say we already have a k8s cluster for some reason We want’s to deploy our app / training job / api server / db…
  • #8 User make deploy of an app We want our app. We don’t really care where it is. Kubernetes handle schedule for you.
  • #9 K8s find a node to deploy Can be any suitable node. Then, how to access it? K8s build a cluster-wide endpoint. K8s networking dns
  • #10 User scale up app deployment K8s find a Node to run second pod Pod/container is easy to ship, deploy, scale
  • #11 K8s found out node 3 down for some reason High Availability Auto health check, error discover Your service still online with zero downtime
  • #12 Your service still online with zero downtime K8s find another suitable node for desired app number: which is 2
  • #13 When node became available again (issue fix) K8s automatically bring pods back Your service back to normal
  • #14 Kubernetes is a distributed sys
  • #15 Let’s talk aboutOne time job. Deploy, execute, save results.
  • #16 Node 3 won’t accept GPU Job
  • #17 The request is very dynamic
  • #18 Node 1 is optimized to run GPU Jobs (high CPU, high disk IO, already have data) Node affinity Node selector
  • #19 K8s try to meet resource request
  • #20 K8s try to meet resource request Resource release when job complete Your data scientists should focus on submitting jobs, not waiting
  • #21 Many pre-defined resource Utilize resource for minimum costing Label your node for more specific need like bandwidth, IO, cost
  • #23 It’s all about data Storage is another very long story Access big data chunk from everywhere. You don’t move thousands of GB Backups? Replicas? Model cumulative feedback (reinforcement learning)
  • #24 Scale your cluster (data center) Containerize your training
  • #25 Google’s kubernetes on GCP Registry, network setting, load balancer Global access
  • #26 Cheap GPU Storage system