A Kubernetes node is a physical server or VM that runs pods. A node's health is determined by condition checks as well as the amount of used cpu, memory, and pod capacity.

Node Conditions

Conditions describe the health of several key node metrics and attributes. They also determine if a node is allowed to have pods scheduled onto it. A full description of node conditions can be found here but they are also summarized below for convenience. Note that not all versions of Kubernetes expose every node condition.

  • OutOfDisk: If unhealthy, then there is not enough disk space for new pods
  • Ready: If unhealthy, then the node will not accept new pods
  • MemoryPressure: If unhealthy, then node memory is low
  • PIDPressure: If unhealthy, then there are too many processes on the node
  • DiskPressure: If unhealthy, then disk capacity is low
  • NetworkUnavailable: If unhealthy, then the network for the node is misconfigured
  • ConfigOK: If unhealthy, then the kubelet is misconfigured


Node Resources

Each node has a finite capacity of CPU and memory that can be allocated towards running pods. To quickly see resource usage on a per-node basis in your cluster, run kubectl describe nodes or if your cluster has heapster,  kubectl top nodes.

CPU is measured in cpu units where one unit is equivalent to one vCPU, vCore, or Core depending on your cloud provider.  Many pods do not require an entire CPU and will request CPU in millicpu. There are 1000 millicpu, or 1000m in one cpu unit.

Memory is measured in bytes but often displayed in a more readable format such as 128Mi. 

To determine if a node is near its capacity, the sum of all of the configured request and limit values for both CPU and memory for pods on a node is compared to the capacity of the node. 

The requested amount of a resource determines if the pod can be scheduled on a node, and can never exceed the capacity of a node. When the requested amount of memory or CPU on a node is near its capacity, no more pods can be scheduled on that node. Typically, a workload will be spread out over several nodes in the cluster, and it is expected that most nodes have roughly the same amount of CPU and memory requested.

The limit amount of a resource determines how many total resources a pod could use if it needed to. It is common for the limit to exceed the capacity on the node when pod resource requests and limits have not been fine-tuned to the application they are running. An over-committed node is one that has a limit that is much higher than the capacity.  It is considered unhealthy because it can cause node performance to degrade if those pods start actually using resources up to the limit. The easiest way to avoid an over-committed node is to configure pod limits to be equal to the requested amounts. 


Pod Limit

The kubelet running on each node has a pod limit (default is 110) which limits the number of pods that can be ran on a node. Reaching this limit means that no more pods will be able to be scheduled. When figuring out how many pods can be ran on each node, remember to look at pods from all namespaces, because system DNS and networking pods count towards the limit.

There are two ways to avoid hitting the pod-per-node limit:

  • Add more nodes: by adding more nodes to the cluster, pods can be scheduled on the new nodes
  • Modify the kubelet command: the kubelet command on the node can be ran with the --max-pods argument to specify the number of max pods on that node


Evicted Pods

The kubelet can prevent total resource starvation by proactively evicting pods when a resource is almost exhausted. Resources that can be monitored for pod eviction include cpu, memory, disk space, and disk inodes.  Both soft and hard limits can be configured with their own thresholds and grace period that will affect how the kubelet evicts pods to reclaim resources. 

Blue Matador monitors your Kubernetes cluster for any eviction events on a node and warns you when evictions happen. Evictions can be fine-tuned by changing the kubelet command-line parameters, as well as ensuring that pods define their resource request and limits appropriately. 

The kubelet may evict more pods than needed to reclaim resources, and may evict pods that do not necessarily solve any resource starvation issues. This is because the kubelet uses a pod's Quality of Service level to determine which pods should be evicted first. Pods whose QoS level is Guaranteed will be evicted after other pods. You can set up critical pods to be Guaranteed by setting resource requests and limits in the pods to an actual limit, and ensuring that the request and limit are the same value.