High Availability
-----------------
Configure liveness and readiness probes
Provision at least 3 master nodes
Isolate etcd replicas
Have a plan for regular etcd backups
Distribute master nodes across zones
Distribute worker nodes across zones
Configure Autoscaling for both master and worker nodes
Bake-in HA load balancing
Configure active-passive setup for scheduler and controller manager
Configure the correct number of pod replicas for high availability
Don't Spin up any naked pods
Setup Federation for multiple clusters
Configure heartbeat and election timeout intervals for etcd members
Setup Ingress
Resource Management
-------------------
Configure resource requests and limits for containers
Specify resource requests and limits for local ephemeral storage
Create separate namespaces for your teams
Configure default resource requests and limits for namespaces
Configure Limit ranges for namespaces
Specify Resource Quotas for namespaces
Configure pod and API Quotas for namespaces
Ensure resource availability for etcd
Configure etcd snapshot memory usage
Attach labels to Kubernetes objects
Limit the number of pods that can run on a node
Reserve compute resources for system daemons
Configure API request processing for API server
Configure out of resource handling
Using recommended settings for Persistent Volumes
Enable log rotation
Prevent Kubelet from setting or modifying label keys
Security
--------
Use the latest Kubernetes version
Enable RBAC (Role-Based Access Control)
Follow user access best practices
Enable audit logging
Set Up a Bastion host
Enable AlwaysPullImages in admission controller
Define pod security policy and enable it in the admission controller
Choose a Network plugin and configured network policies
Implement authentication for kubelet
Configure Kubernetes secrets
Enable data encryption at rest
Disable default service account
Scan containers for security vulnerabilities
Configure security context for pods, containers and volumes
Enable Kubernetes logging
Scalability
-----------
Configure the horizontal autoscaler
Configure vertical pod autoscaler
Configure cluster autoscaler
Monitoring
----------
Set up a monitoring pipeline
Select a list of metrics to monitor
Mainetance
----------
Ensure same application pods goes to same nodes
node and pod affinities/anti affinities with taints and tolerations
Find faulty node in kubernetes cluster
node-problem-detector
How to remove node from cluster
1. drain - Draino automatically drains Kubernetes nodes based on labels and node
conditions. Nodes that match all of the supplied labels and any of
the supplied node conditions will be prevented from accepting new pods (aka
'cordoned') immediately , and drained after a configurable time.
2. Draino can be used in conjunction with the Cluster Autoscaler to automatically
terminate drained nodes.
Ksonnet kustomize velero
Ksonnet is a framework for writing, sharing, and deploying Kubernetes application
manifests. With its CLI, you can generate a complete
application from scratch in only a few commands, or manage a complex system at
scale.
Kustomize lets you customize raw, template-free YAML files for multiple purposes,
leaving the original YAML untouched and usable as is.
Velero (Heptio Ark), has become the de-facto number one backup tool for Kubernetes
clusters. It also takes snapshots of your cluster's Persistent
Volumes using your cloud provider's block storage snapshot features, and can
then restore your cluster's objects and Persistent Volumes
Misc
----
Run an end-to-end (e2e) test
Map external services
Install the DNS add-on
Why we should not use helm for production
Tiller defaults to storing application secrets inside configmaps (i.e. plaintext)
which is not secure