KEMBAR78
Evolving for Kubernetes | PDF
Chris 'mac' McEniry
Agenda
• Some Items about Kubernetes

• Lift and Shift

• Evolving Applications for Kubernetes

• Changing People/Processes for Kubernetes
tl;dr
• Be ready for change

• As you evolve your processes/support, you have to provide backwards compatibility for both your infrastructure/applications and your processes

• Really focus on having a stable deployment mechanism

• Sort out your interaction contracts

• Start small and constrained - Realize that you won't do it "right"

• You will have to make application changes

• You will have to change your expectations

• Be ready for cutting edges

• Kubernetes has very simple set of core primitives, and a lot of options to build on top

• CICD

• Authentication and Access Control

• Resource management - forcing to use limits/requests

• Avoiding mixing goals
Perspective (Me)
• I'm a platform administration. I make sure the clusters...

• are available,

• have resources,

• connect to the rest of the infrastructure.

• I run applications, but for the most part help app teams use the platform
Survey (You)
• Who's familiar with cfengine / puppet / chef / ansible / etc?

• Who's familiar with current containers?

• Who's familiar with Kubernetes? The object / resource model?

• Who's running Kubernetes in Production (even for the smallest workload)?
Some items about Kubernetes...
Declarative State
• Tell me what you want

• Not how to do it

• How to do it can change in different contexts

• "LoadBalancer" is slightly different in AWS, GCP, Azure

• All state stored in the API Server
Reconciliation Loop Driven
• Many specific independent actors

• Controllers

• Operators

• Actors implement declarative state and current state

• Actor can change declarative state for another actor
and trigger its actions

• Main ones

• schduler

• controller-manager

• kubelet
Network Focused
• Interact with other applications over network
Networking
• Everything in the cluster is reachable to everything else

• (Policy might restrict)

• Magic Mappings.

• L4 Load Balancer

• DNS Mapping

• Map from outside cluster to inside
Resources
• What are used to defined declarative state

• Stored in cluster (API Server)

• aka Manifests
Container
• Process

• Isolation: Namespaces

• Resource Management: CGroups

• Restriction: Capabilities
Pod
• Collection of containers working tightly together

• Unit of Scheduling

• Share network stack

• Can share disk mounts

• Sidecar: a support container running with the application container
Service
• Logical construct representing a network resource
ConfigMap
• (Static) Configuration data stored inside the cluster

• Can be exposed as environment variable(s) or files place into the
container
Higher Level Constructs
• Multiple Pods all the same ==> ReplicaSet

• Multiple ReplicaSets running an application (with canned rolling update
mechanism) ==> Deployment

• Sensitive ConfigMaps ==> Secret

• Different Service Implementations ==> Ingress
Controller -> Controller (via API Server)
Kubelet
Endpoint
Controller
RS
Controller
Deploy
Controller
Replica
Set
PodDeployment
Service
App Route
Kubernetes is a Distributed System
for Building Distributed Systems
full of Independent Actors.
Lift and Shift
(Phase 1)
Goals
• Run an application inside of Kubernetes

• Change the code as little as possible

• Hook into the existing infrastructure as much as possible

• Keep it simple - avoid state, storage, etc

• Kick the tires
Starting Point
• Application Server based

• Takes requests in

• Talks to a Database in the Back
Host
App
Server
CLIENT
DB
Starting Point - Control
• Configuration

• Logs
Host
App
Server
LOGSCONFIG
CLIENT
DB
Host
Tomcat LOGSTASHCHEF
CLIENT
DB
App Design for Kubernetes
• Application Pod with Logstash Sidecar Pod

• ConfigMap holding prerendered output from Chef. Mounted under conf dir

• Shared mount (emptydir) for log output

• Written by app

• Read by logstash

• Service definition to map from outside to inside
C
TomcatConfig
CLIENT
DB
LOG
STASH
Pod
C
TomcatConfig
CLIENT
DB
LOG
STASH
App
Container
Logstash
Sidecar
Container
C
TomcatConfig
CLIENT
DB
LOG
STASH
Prerendered
Conf
stored in
ConfigMap
Mounted
into app
container for
conf dir
C
TomcatConfig
CLIENT
DB
LOG
STASH
Shared
mount for
logs
C
TomcatConfig
CLIENT
DB
LOG
STASH
Service
Definition
Team Processes for Kubernetes
• kubectl is the hammer

• Deploy using manifests (from source control) with kubectl apply -f

• Troubleshooting

• kubectl logs ...
• kubectl exec ...
Lift and Shift: Success!
• It worked!
Lessons Learned
• Startup time takes its time (tomcat startup)

• Debugging (kube exec)

• ConfigMap/Deployments only worked for one environment

• Healthchecks didn't fit well in the model and worked counter to
debugging steps
Next Goals
• Determine how to sustainable run applications inside of Kubernetes
Evolving Applications for Kubernetes
(Phase 2,3,4,5,6,7,8,9....)
https://www.redhat.com/cms/managed-files/cl-cloud-native-container-design-whitepaper-f8808kc-201710-v3-en.pdf
1. Single Concern Principle
• Do one thing (and do it well)

• Separation of Concerns

• Target updates

• Minimize (vertical) image sprawl
2. Image Immutability Principle
• Image is a delivery artifact with all of
the properties that that should have

• "Build once, deploy everywhere"

• Don't layer configuration on as part of
image (unless you're putting *all*
foreseeable configuration possibilities
in there)
3. Self-Containment Principle
• Extension of Image Immutability

• On deployment, layer in instance
unique items (config, data)

• This uniqueness layer should be
specific to this instance
4. Runtime Confinement Principle
• Get an understanding of your
resource requirements

• And use them! (helps with scheduling)

• Without them ==>

• Unintentional, uninformed
oversubscription

• Roving micro-oversubscription
hotspots
5. Process Disposability Principle
• Processes are ephemeral

• Before ready for them to not be there

• This will happen often (every change)
Containers (by themselves) are half suited for
Kubernetes
• Kubernetes builds on containers

• If you have been following container
modeling, that translates directly
Kubernetes Cluster Ecosystem
• Application process interacts with
the cluster
6. Life-Cycle Conformance Principle
• Figure out your timings (shutdown cleanly, startup)
7. High Observability Principle
• Change in behaviors

• Biggest change in thinking

• Forced thinking of items like health checks and monit et al

• Add to Disposability Principle - have to be able to debug quickly, over the network, and with remaining forensics
Changing People/Process for Kubernetes
Be Ready for Change
• Changed Deployment Strategy

• Single manifests -> Helm Charts

• Changed Helm Chart Structure 5 Times

• Changed Logging Infrastructure 3 Times

• BE VERY CAREFUL IN WHAT YOU STOP SUPPORTING
Contracts
• Describe what each side/component of processes will provide and accept

• Helps to define

• What can be changed without impacting others

• What needs to be talked about before changing
Kubernetes Cluster Kubernetes Team App Team / User
-Receives App Definition
+Define App Definition

(name, resource count, users)
-Receives namespace, RBAC +Defines namespace, RBAC
-Trusts central auth +Logs in via central auth
-Allows access to granted resources +Accesses namespace
App Team Onboarding to Cluster
App (Pod) Kubernetes Cluster Monitoring App Team / User
+Logs to STDOUT -Receives from STDOUT
+Transmits to Logging Bus -Receives on Logging Bus
+JSON Structured Log
Format
-Handles JSON Format
+Indexes in Search Tool -Search in Search Tool
+Infrastructure enrichments:
pod, cluster, container host,
environment, datacenter
-Search by infrastructure
information
Logging Contract
Be Comfortable with Being Uncomfortable
• A lot of this technology is new/recent

• A lot of simple implementations (first pass)

• A lot of undiscovered bugs

• "Best practices" are highly localized
Simplified Primitives
• Deployments

• All at once (destroy + build)

• Rolling

• Load Balancing

• Only equal weight round robin (be it via L4 forwarding, or DNS)

• What's Layer7?
Common App Team Concerns
• How do I get onboarded?

• How does my application have to interact with the system?

• How do I run my application?

• How do I troubleshoot my application?

• How does it all work?!!?!?!
How do I run my application?
• Build an Application Template

• Dockerfile

• Helm Chart 

• Jenkinsfile

• Extend with organizational specific functions

• Partial Deploy functions

• Incorporate environment values
Debugging
• kubectl exec ...
How does it all work?!?!?!?!
• Boot camps
Current
C
Jar
App
Config
Map
CLIENT
DB
Secret
Management
Config
Map
New Application Model
• Jar App (faster start up)

• zmetrics port (separate from client interface)

• Prometheus scrapes metrics

• Readiness/liveness probes

• Logs to STDOUT
New Deployment Model
• CICD Driven

• Standard format for repository

• Dockerfile, Chart --> artifacts

• Environment specific values

• Multiproject pipeline pushes to multiple environments with approval gates

• Automatic canary deploy, sanity check, then full deploy
Lessons Learned
• If you change too quickly, you will be in for a world of hurt

• Different ways to deploy

• Different Kubernetes versions (v1alpha1, v1beta1, v1)

• You can't please everyone

• Tradeoffs

• Training - bootcamps and walking people through...

• Examples examples examples - easy to copy (cargo culting)
Health Checks
• Liveliness probe: If this fails, Kubernetes will restart the
container.

• Readiness probe: If this fails, Kubernetes will take the
pod out of the service pool.

• If an app is bad, I should stop sending traffic to it and
recover it, right? ==> Ok to set these to the same
thing.
https://cloud.google.com/kubernetes-engine/kubernetes-comic/
"We DDOSed Ourselves!!!"
• On startup, application can be ready

• Gets flooded with traffic

• Kube restarts because liveness failed as well

• Quick fix: Removed liveness

• Real fix:

• Run liveness and readiness on a different port/
connection threadpool/etc

• Know they mean different items
• Prometheusbeat has limited support

• Security scanning checkbox

• Type:LoadBalancer Services (and anything built off them) get a permit *
ICMP Destination Unreachable (Type 3) - runs afoul of security policies

• Provide helper tools to setup configuration

• Login ==> can also gather cluster information like certificates and
endpoints
Q?
Obligatory - we're hiring...
Maybe Answers...

Evolving for Kubernetes

  • 1.
  • 2.
    Agenda • Some Itemsabout Kubernetes • Lift and Shift • Evolving Applications for Kubernetes • Changing People/Processes for Kubernetes
  • 3.
    tl;dr • Be readyfor change • As you evolve your processes/support, you have to provide backwards compatibility for both your infrastructure/applications and your processes • Really focus on having a stable deployment mechanism • Sort out your interaction contracts • Start small and constrained - Realize that you won't do it "right" • You will have to make application changes • You will have to change your expectations • Be ready for cutting edges • Kubernetes has very simple set of core primitives, and a lot of options to build on top • CICD • Authentication and Access Control • Resource management - forcing to use limits/requests • Avoiding mixing goals
  • 4.
    Perspective (Me) • I'ma platform administration. I make sure the clusters... • are available, • have resources, • connect to the rest of the infrastructure. • I run applications, but for the most part help app teams use the platform
  • 5.
    Survey (You) • Who'sfamiliar with cfengine / puppet / chef / ansible / etc? • Who's familiar with current containers? • Who's familiar with Kubernetes? The object / resource model? • Who's running Kubernetes in Production (even for the smallest workload)?
  • 6.
    Some items aboutKubernetes...
  • 7.
    Declarative State • Tellme what you want • Not how to do it • How to do it can change in different contexts • "LoadBalancer" is slightly different in AWS, GCP, Azure • All state stored in the API Server
  • 8.
    Reconciliation Loop Driven •Many specific independent actors • Controllers • Operators • Actors implement declarative state and current state • Actor can change declarative state for another actor and trigger its actions • Main ones • schduler • controller-manager • kubelet
  • 9.
    Network Focused • Interactwith other applications over network
  • 10.
    Networking • Everything inthe cluster is reachable to everything else • (Policy might restrict) • Magic Mappings. • L4 Load Balancer • DNS Mapping • Map from outside cluster to inside
  • 11.
    Resources • What areused to defined declarative state • Stored in cluster (API Server) • aka Manifests
  • 12.
    Container • Process • Isolation:Namespaces • Resource Management: CGroups • Restriction: Capabilities
  • 13.
    Pod • Collection ofcontainers working tightly together • Unit of Scheduling • Share network stack • Can share disk mounts • Sidecar: a support container running with the application container
  • 14.
    Service • Logical constructrepresenting a network resource
  • 15.
    ConfigMap • (Static) Configurationdata stored inside the cluster • Can be exposed as environment variable(s) or files place into the container
  • 16.
    Higher Level Constructs •Multiple Pods all the same ==> ReplicaSet • Multiple ReplicaSets running an application (with canned rolling update mechanism) ==> Deployment • Sensitive ConfigMaps ==> Secret • Different Service Implementations ==> Ingress
  • 17.
    Controller -> Controller(via API Server) Kubelet Endpoint Controller RS Controller Deploy Controller Replica Set PodDeployment Service App Route
  • 18.
    Kubernetes is aDistributed System for Building Distributed Systems full of Independent Actors.
  • 19.
  • 20.
    Goals • Run anapplication inside of Kubernetes • Change the code as little as possible • Hook into the existing infrastructure as much as possible • Keep it simple - avoid state, storage, etc • Kick the tires
  • 21.
    Starting Point • ApplicationServer based • Takes requests in • Talks to a Database in the Back
  • 22.
  • 23.
    Starting Point -Control • Configuration • Logs
  • 24.
  • 25.
  • 26.
    App Design forKubernetes • Application Pod with Logstash Sidecar Pod • ConfigMap holding prerendered output from Chef. Mounted under conf dir • Shared mount (emptydir) for log output • Written by app • Read by logstash • Service definition to map from outside to inside
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
    Team Processes forKubernetes • kubectl is the hammer • Deploy using manifests (from source control) with kubectl apply -f • Troubleshooting • kubectl logs ... • kubectl exec ...
  • 33.
    Lift and Shift:Success! • It worked!
  • 34.
    Lessons Learned • Startuptime takes its time (tomcat startup) • Debugging (kube exec) • ConfigMap/Deployments only worked for one environment • Healthchecks didn't fit well in the model and worked counter to debugging steps
  • 35.
    Next Goals • Determinehow to sustainable run applications inside of Kubernetes
  • 36.
    Evolving Applications forKubernetes (Phase 2,3,4,5,6,7,8,9....)
  • 37.
  • 38.
    1. Single ConcernPrinciple • Do one thing (and do it well) • Separation of Concerns • Target updates • Minimize (vertical) image sprawl
  • 39.
    2. Image ImmutabilityPrinciple • Image is a delivery artifact with all of the properties that that should have • "Build once, deploy everywhere" • Don't layer configuration on as part of image (unless you're putting *all* foreseeable configuration possibilities in there)
  • 40.
    3. Self-Containment Principle •Extension of Image Immutability • On deployment, layer in instance unique items (config, data) • This uniqueness layer should be specific to this instance
  • 41.
    4. Runtime ConfinementPrinciple • Get an understanding of your resource requirements • And use them! (helps with scheduling) • Without them ==> • Unintentional, uninformed oversubscription • Roving micro-oversubscription hotspots
  • 42.
    5. Process DisposabilityPrinciple • Processes are ephemeral • Before ready for them to not be there • This will happen often (every change)
  • 43.
    Containers (by themselves)are half suited for Kubernetes • Kubernetes builds on containers • If you have been following container modeling, that translates directly
  • 44.
    Kubernetes Cluster Ecosystem •Application process interacts with the cluster
  • 45.
    6. Life-Cycle ConformancePrinciple • Figure out your timings (shutdown cleanly, startup)
  • 46.
    7. High ObservabilityPrinciple • Change in behaviors • Biggest change in thinking • Forced thinking of items like health checks and monit et al • Add to Disposability Principle - have to be able to debug quickly, over the network, and with remaining forensics
  • 47.
  • 48.
    Be Ready forChange • Changed Deployment Strategy • Single manifests -> Helm Charts • Changed Helm Chart Structure 5 Times • Changed Logging Infrastructure 3 Times • BE VERY CAREFUL IN WHAT YOU STOP SUPPORTING
  • 49.
    Contracts • Describe whateach side/component of processes will provide and accept • Helps to define • What can be changed without impacting others • What needs to be talked about before changing
  • 50.
    Kubernetes Cluster KubernetesTeam App Team / User -Receives App Definition +Define App Definition (name, resource count, users) -Receives namespace, RBAC +Defines namespace, RBAC -Trusts central auth +Logs in via central auth -Allows access to granted resources +Accesses namespace App Team Onboarding to Cluster
  • 51.
    App (Pod) KubernetesCluster Monitoring App Team / User +Logs to STDOUT -Receives from STDOUT +Transmits to Logging Bus -Receives on Logging Bus +JSON Structured Log Format -Handles JSON Format +Indexes in Search Tool -Search in Search Tool +Infrastructure enrichments: pod, cluster, container host, environment, datacenter -Search by infrastructure information Logging Contract
  • 52.
    Be Comfortable withBeing Uncomfortable • A lot of this technology is new/recent • A lot of simple implementations (first pass) • A lot of undiscovered bugs • "Best practices" are highly localized
  • 53.
    Simplified Primitives • Deployments •All at once (destroy + build) • Rolling • Load Balancing • Only equal weight round robin (be it via L4 forwarding, or DNS) • What's Layer7?
  • 54.
    Common App TeamConcerns • How do I get onboarded? • How does my application have to interact with the system? • How do I run my application? • How do I troubleshoot my application? • How does it all work?!!?!?!
  • 55.
    How do Irun my application? • Build an Application Template • Dockerfile • Helm Chart • Jenkinsfile • Extend with organizational specific functions • Partial Deploy functions • Incorporate environment values
  • 56.
  • 57.
    How does itall work?!?!?!?! • Boot camps
  • 58.
  • 59.
  • 60.
    New Application Model •Jar App (faster start up) • zmetrics port (separate from client interface) • Prometheus scrapes metrics • Readiness/liveness probes • Logs to STDOUT
  • 61.
    New Deployment Model •CICD Driven • Standard format for repository • Dockerfile, Chart --> artifacts • Environment specific values • Multiproject pipeline pushes to multiple environments with approval gates • Automatic canary deploy, sanity check, then full deploy
  • 62.
  • 63.
    • If youchange too quickly, you will be in for a world of hurt • Different ways to deploy • Different Kubernetes versions (v1alpha1, v1beta1, v1) • You can't please everyone • Tradeoffs • Training - bootcamps and walking people through... • Examples examples examples - easy to copy (cargo culting)
  • 64.
    Health Checks • Livelinessprobe: If this fails, Kubernetes will restart the container. • Readiness probe: If this fails, Kubernetes will take the pod out of the service pool. • If an app is bad, I should stop sending traffic to it and recover it, right? ==> Ok to set these to the same thing. https://cloud.google.com/kubernetes-engine/kubernetes-comic/
  • 65.
    "We DDOSed Ourselves!!!" •On startup, application can be ready • Gets flooded with traffic • Kube restarts because liveness failed as well • Quick fix: Removed liveness • Real fix: • Run liveness and readiness on a different port/ connection threadpool/etc • Know they mean different items
  • 66.
    • Prometheusbeat haslimited support • Security scanning checkbox • Type:LoadBalancer Services (and anything built off them) get a permit * ICMP Destination Unreachable (Type 3) - runs afoul of security policies • Provide helper tools to setup configuration • Login ==> can also gather cluster information like certificates and endpoints
  • 67.
    Q? Obligatory - we'rehiring... Maybe Answers...