AWS ECS Service | Blue Matador

Resource Utilization

For ECS services running in Fargate, or services running with EC2 and at least version 1.4.0 of the container agent installed, the CPUUtilization and MemoryUtilization CloudWatch metrics can help you scale and determine where your resources are being utilized.

Unlike the cluster resource utilization metrics, the service metrics may exceed 100% utilization. This is possible when the task definition for a service’s tasks does not specify a hard limit for memory, or for CPU utilization with the EC2 launch type.

Blue Matador monitors CPU and Memory utilization and will warn you when you are approaching 100% utilization. Even in the case that other services are not impacted by utilization above 100%, it is recommended that your task definitions be updated to reflect actual usage to prevent future issues with other services.

Running Tasks

One of the key features provided by ECS services is controlling the number of running tasks. For daemon services, the number of tasks is equal to the number of EC2 container instances in your cluster. For replica services, the number of tasks is defined by you in the service. Blue Matador monitors the number of running tasks in the service and will alert or warn you in the event the service has fewer running tasks than expected.

You could have fewer tasks than expected for a variety of reasons. For daemon services, adding a new node to your ECS cluster would cause the expected number of tasks to increase. It may take time for the node to become available and for your tasks to get scheduled onto it. Any time you increase the desired number of tasks for a replica service, you may have issues finding capacity to schedule the extra tasks. If this is the case, you may have to scale up your cluster or change the resource requirements for the task.

In a newly created service, it may be the case that the task definition has a bad image, preventing tasks from launching. It may also be the case that newly launched tasks exit unexpectedly, or are being terminated for failing the service health check. Check the list of stopped tasks for the services, and see if the logs for stopped tasks indicate what the issue with the task is. The event list for the service may also help you diagnose why tasks are being terminated.

Resources

ECS Services (AWS Documentation)
Amazon ECS CloudWatch Metrics (AWS Documentation)
Amazon ECS Troubleshooting (AWS Documentation)