Kubernetes Cluster | Blue Matador

API Health

Blue Matador checks the healthz endpoint of the Kubernetes API periodically to ensure that the API reports itself as healthy. If the check fails consistently, an event is created. While it is rare that this kubernetes health check fails, it can severely impact your ability to manage the cluster. If you are able to still make API calls and use kubectl, check the component statuses and the health of each node and look for any events that may have caused the API to become unhealthy. Another possible cause is if a network issue is preventing one of Blue Matador's agents from reaching the API. Check for any firewalls between the two and ensure the issue is not wide spread.

Component Statuses

A Kubernetes component status describes the high-level health of one of several kubernetes essential cluster services. An unhealthy component can cause issues including incorrectly scheduled pods and not recognizing all nodes in the cluster. Unfortunately, component statuses are not very well-documented, and issues with them can be difficult to diagnose, and may even be benign.

A few common issues with component statuses include:

Bootstrap: In a newly bootstrapped cluster, components may take time to become healthy, or may not initialize properly at all.
Cluster Upgrade: In a quickly-evolving product like Kubernetes, bugs are sometimes introduced in minor version upgrades.
etcd: Kubernetes relies on etcd as a key-value store to store all cluster data. Read the documentation on the Kubernetes documentation site for help administering etcd in your cluster.
scheduler: The scheduler is responsible for assigning pods to nodes, but does not run the pods. If there is an issue with the scheduler, pods may not be scheduled to run
controller-manager: The controller-manager handles the state of controllers such as deployments and daemonsets to get the current state of the system to the state described in the controller. Issues with the controller-manager will affect anything that schedules pods

Solutions

Using a bootstrapping tool such as kops or running a managed cluster in your cloud provider can make bootstrapping and managing a Kubernetes cluster much easier. Having a defined process for upgrading cluster resources, or even spinning up an entirely new cluster, can reduce the impact that component issues present.

To debug problems related to cluster upgrades, check out the Github issues to see if the issues you are seeing have a workaround or have been fixed in a different version.

Resources

Nodes (Kubernetes Documentation)
Kubelet Options (Kubernetes Documentation)
Troubleshoot Clusters (Kubernetes Documentation)
How Does the Kubernetes Scheduler Work?(Blog: Julia Evans)
Kubernetes Github Issues (Github)
Kops (Github)