A Kubernetes daemon set manages the scheduling and lifecycle of pods such that exactly one pod will run on each node in the cluster.


Unhealthy Daemon Sets


When a daemon set does not have one pod running per node, it is considered unhealthy. Often there is an issue with the node that each pod is scheduled on.

Common causes of an unhealthy daemon set:

  • Pod stuck in a crash loop
  • Pending pods

 

Quick Fix


If a pod is stuck in a crash loop, it may just be running out of resources. Revisit the spec and see if increasing the CPU or memory request and limit values allows the pod to run for longer. You can then fully troubleshoot the pod by checking its logs. If resource usage is okay, then investigate the command being ran by the pod. If the container terminates quicker than expected, check which image is being used in the spec to make sure it is the right one.

If one or more pods in the daemon set are pending, it may be the case that there are not enough resources to schedule the pod on each node.  This can be resolved by:

  • Lowering the requested CPU and memory of the daemon set
  • Move some pods off of the affected nodes to free up resources
  • Scale up the nodes to accommodate the pods from the daemon set

 

Thorough Fix


To prevent a daemon set from running on certain nodes, you can modify the node's taints or the daemon set's tolerations. This is useful to prevent a daemon set from targeting specialized nodes that may not have enough resources.

If you do not actually need the one-pod-per-node functionality of a daemon set, consider using a deployment instead, which has more flexibility around the number of pods running, and where they run.

 

RESOURCES