A Kubernetes Job manages the execution of one or more pods until completion. A job can limit the runtime of a pod, keeps track of the status of the pod, and can retry if the pod fails. Jobs themselves can be managed by a CronJob that schedules jobs to run using a cron expression. Blue Matador monitors all jobs running in your cluster and notifies you if a job completely fails.
Jobs can be configured to fail in certain conditions. This is recommended when creating jobs so that you can track if the pods being ran are running successfully. There are multiple ways to configure a job so that it fails:
When Blue Matador detects that a recently completed job failed, an anomaly will be created. If the same job, or if multiple jobs controlled by the same CronJob fail consistently, then a warning will be created so that the issue can be investigated further.
When troubleshooting failed jobs, take a look at both the job and pod resources being created. Kubernetes will create events if a job is timing out, or if a pod fails to start or exits unexpectedly and causes the job to fail. Depending on your job configuration, it could take a long time for a job to completely give up and mark itself as failed. Check for other events around the time the job first attempted to run when correlating the issue.