Kubernetes DaemonSet | Blue Matador

Unhealthy Daemon Sets

When a daemon set does not have one pod running per node, it is considered unhealthy. Often there is an issue with the node that each pod is scheduled on.

Common causes of an unhealthy daemon set:

Pod stuck in a crash loop
Pending pods

Quick Fix

If a pod is stuck in a crash loop, it may just be running out of resources. Revisit the spec and see if increasing the CPU or memory request and limit values allows the pod to run for longer. You can then fully troubleshoot the pod by checking its logs. If resource usage is okay, then investigate the command being ran by the pod. If the container terminates quicker than expected, check which image is being used in the spec to make sure it is the right one.

If one or more pods in the daemon set are pending, it may be the case that there are not enough resources to schedule the pod on each node. This can be resolved by:

Lowering the requested CPU and memory of the daemon set
Move some pods off of the affected nodes to free up resources
Scale up the nodes to accommodate the pods from the daemon set

Thorough Fix

To prevent a daemon set from running on certain nodes, you can modify the node's taints or the daemon set's tolerations. This is useful to prevent a daemon set from targeting specialized nodes that may not have enough resources.

If you do not actually need the one-pod-per-node functionality of a daemon set, consider using a deployment instead, which has more flexibility around the number of pods running, and where they run.

Resources

Kubernetes DaemonSets (Kubernetes Documentation)
Kubernetes Deployments (Kubernetes Documentation)
Taint and Toleration (Kubernetes Documentation)
Debug Pods and Replication Controllers (Kubernetes Documentation)
Perform a Rolling Update on a DaemonSet (Kubernetes Documentation)