Cheatsheet: Kubernetes Monitoring
Cluster state metrics
Container metrics
MORE INFO >
DESCRIPTION NAME IN KUBE-STATE-METRICS COMMAND DESCRIPTION NAME IN
KUBE-STATE-METRICS COMMAND
Running pods kube_pod_status_phase kubectl get pods Containers running on a pod
kube_pod_container_info kubectl describe pod <POD_NAME>
Number of pods desired for a
Deployment kube_deployment_spec_replicas kubectl get deployment <DEPLOYMENT>
Containers restarted on a pod kube_pod_container_status_restarts_total kubectl describe pod
<POD_NAME>
Number of pods desired for a
DaemonSet Containers terminated on a pod kube_pod_container_status_terminated kubectl
describe pod <POD_NAME>
kube_daemonset_status_desired_number_scheduled
kubectl get daemonset <DAEMONSET>
Number of pods currently running
kube_deployment_status_replicas
in a Deployment kubectl get deployment <DEPLOYMENT>
Number of pods currently running
kube_daemonset_status_current_number_scheduled
in a DaemonSet kubectl get daemonset <DAEMONSET>
Number of pods currently
available in a Deployment kube_deployment_status_replicas_available kubectl get deployment
<DEPLOYMENT>
Number of pods currently
available in a DaemonSet kube_daemonset_status_number_available kubectl get daemonset
<DAEMONSET>
Number of pods currently not
available in a Deployment kube_deployment_status_replicas_unavailable kubectl get deployment
<DEPLOYMENT>
Number of pods currently not
available in a DaemonSet kube_daemonset_status_number_unavailable kubectl get daemonset
<DAEMONSET>
Node resource and status metrics
DESCRIPTION
MORE INFO >
NAME IN KUBE-STATE-METRICS
COMMAND
Current health status of a node
(kubelet) kube_node_status_condition kubectl describe node <NODE_NAME>
Total memory requests (bytes)
per node kube_pod_container_resource_requests_memory_bytes kubectl describe node
<NODE_NAME>
Total memory in use on a node N/A kubectl describe node <NODE_NAME>
Total CPU requests (cores) per
node kube_pod_container_resource_requests_cpu_cores kubectl describe node <NODE_NAME>
Total CPU in use on a node N/A kubectl describe node <NODE_NAME>
Job metrics
MORE INFO >
DESCRIPTION NAME IN KUBE-STATE-METRICS COMMAND
Number of successful jobs kube_job_status_succeeded kubectl get jobs --all-namespaces |
grep “succeeded”
Number of failed jobs kube_job_status_failed kubectl get jobs --all-namespaces |
grep “failed”
Number of active jobs kube_job_status_active kubectl get jobs --all-namespaces
Number of CronJobs kube_cronjob_info kubectl get cronjobs --all-namespaces
Service metrics
MORE INFO >
DESCRIPTION NAME IN KUBE-STATE-METRICS COMMAND
Service types per cluster kube_service_info kubectl get services --all-namespaces
Number of pods running by
service kubectl get pods --selector=<SERVICE_SELECTOR>
-o=name kubectl get jobs --all-namespaces
Disk I/O & Network metrics
DESCRIPTION PROMETHEUS METRIC NAME COMMAND
Network in per node container_network_receive_bytes_total kubectl get --raw
/api/v1/nodes/<NODE_
NAME>/proxy/metrics/cadvisor
Network out per node container_network_transmit_bytes_total kubectl get --raw
/api/v1/nodes/<NODE_
NAME>/proxy/metrics/cadvisor
Disk writes per node container_fs_writes_bytes_total kubectl get --raw /api/v1/nodes/<NODE_
NAME>/proxy/metrics/cadvisor
Disk reads per node container_fs_reads_bytes_total kubectl get --raw /api/v1/nodes/<NODE_
NAME>/proxy/metrics/cadvisor
Network errors per node container_network_receive_errors_total,
container_network_transmit_errors_total kubectl get --raw /api/v1/nodes/<NODE_
NAME>/proxy/metrics/cadvisor
Kubernetes events
MORE INFO >
DESCRIPTION COMMAND
List events kubectl get eventsCheatsheet: Kubernetes Monitoring with Datadog
1. Cluster state metrics
METRIC DESCRIPTION DATADOG STATUS CHECK/METRIC NAME
Running pods kubernetes.pods.running
Number of pods desired for a Deployment kubernetes_state.deployment.replicas_desired
Number of pods desired for a DaemonSet kubernetes_state.daemonset.desired
Number of pods currently running in a Deployment kubernetes_state.deployment.replicas
Number of pods currently running in a DaemonSet kubernetes_state.daemonset.scheduled
Number of pods currently available in a Deployment
kubernetes_state.deployment.replicas_available
Number of pods currently available in a DaemonSet kubernetes_state.daemonset.ready
Number of pods currently not available in a Deployment
kubernetes_state.deployment.replicas_unavailable
Number of pods currently not available in a DaemonSet kubernetes_state.daemonset.desired -
kubernetes_state.daemonset.ready
2. Node resource and status metrics
METRIC DESCRIPTION DATADOG METRIC NAME
Current health status of a node (kubelet) kubernetes.kubelet.check
Total memory requests (bytes) per node kubernetes.memory.requests
Total memory in use on a node kubernetes.memory.usage
Total CPU requests (cores) per node kubernetes.cpu.requests
Total CPU in use on a node kubernetes.cpu.usage.total
3. Job metrics
METRIC DESCRIPTION DATADOG METRIC NAME
Number of successful jobs kubernetes_state.job.succeeded
Number of failed jobs kubernetes_state.job.failed
Number of active jobs kubernetes_state.job.count
Number of CronJobs kubernetes_state.job.count (filtered by the owner_kind:cronjob tag)
4. Service metrics
METRIC DESCRIPTION DATADOG METRIC NAME
Service types per cluster kubernetes_state.service.count
Number of pods running by service kubernetes.pods.running
5. Container metrics
METRIC DESCRIPTION DATADOG METRIC NAME
Containers running on a pod kubernetes_state.container.running
Containers restarted on a pod kubernetes_state.container.restarts
Containers terminated on a pod kubernetes_state.container.terminated
6. Disk I/O & Network metrics
METRIC DESCRIPTION DATADOG METRIC NAME
Network in per node kubernetes.network.rx_bytes
Network out per node kubernetes.network.tx_bytes
Disk writes per node kubernetes.io.write_bytes
Disk reads per node kubernetes.io.read_bytes
Network errors per node kubernetes.network.rx_errors, kubernetes.network.tx_errors
7. Events
Kubernetes events will appear in the Datadog Events Explorer and in event widgets on dashboards