AWS ECS Task | Blue Matador

Task Connectivity

Connectivity simply states whether the task is connected to ECS or not. A task may become disconnected if the network or security policies affecting the task change, or if the underlying EC2 instance that a task is running on fails. For tasks controlled by services, the service will usually manage the task and replace it. For a task that is manually ran, it may need to be manually re-created if the condition persists.

Task Health

The Health Status of a task is determined by the health status of the essential containers in the task. The ECS agent only monitors the health status defined in the task definition for this purpose; any Docker health checks embedded in the container image are ignored. The task health will be healthy if all essential containers are healthy, unhealthy if any of the essential containers are unhealthy, and unknown if any of the essential containers are unknown.

Blue Matador monitors running tasks to ensure they are not unhealthy. An unknown health status is considered healthy for this purpose since real-world usage of task definitions often omit any health checks, causing an unknown health status. To get the full benefit of this event, and so that ECS services properly replace unhealthy tasks, define health checks in your task definitions for all essential containers.

Task Stopped

When a task is stopped for any unexpected reason, Blue Matador creates an event. Unexpected reasons may include out of memory, failed load balancer health checks, failure of the underlying EC2 container instance, and many others. For tasks scheduled by an ECS service, the task should be replaced. If there is an issue with the task or service definition, tasks may be repeatedly stopped and replaced -- this should be addressed immediately to prevent any service interruptions.

Task Pending

Blue Matador will detect any tasks that have been created but have taken a significant amount of time to enter the running state. A pending task could be caused by slow network, inability to schedule a task due to resources, or some issue with the ECS agent running on underlying EC2 container instances.

If you are using the EC2 launch type for your tasks, the issue may be related to issues with the ECS agent stopping previous versions of the tasks when updating. ECS users have found success in manually stopping the old version of a task, or with restarting the ECS agent on the EC2 instance.

Resources

ECS Services (AWS Documentation)
Amazon ECS Troubleshooting (AWS Documentation)
Scheduling Amazon ECS Tasks (AWS Documentation)
Amazon ECS Task Definitions (AWS Documentation)
Task stuck in pending state #731 (Github)
Fixing ECS Tasks stuck in a pending state (Medium)