AWS ELB Requests | Blue Matador

Request Count

A healthy application should see a relatively stable request rate. Both an anomalous increase or decrease in request count could signal a malfunctioning application. Possible causes of changes in request count include:

Errors in your server side code causing retry logic in your clients to make many requests to a failing endpoint
A release of buggy client code causing erroneous API requests
A malfunctioning cache layer

If the request count increased for legitimate reasons (increase in users or a new feature), you may need to add additional targets to your load balancer to handle the increased load.

400s

When the rate of 4xx response codes increases, it’s likely the case that a client that makes requests to your ELBs is buggy. Possible reasons include:

Typos in URLs resulting in a spike of 404 errors
Parameter names and types changed for REST APIs resulting in 400 errors
A bug in authentication code resulting in 401s and 403s

500s

When the rate of 5xx response codes increases, your problem is most likely due to a bug in your server side code. The increase can often be tied to a specific code release. Correlating with your release schedule should be your first place to look for clues as to what went wrong. Other possible causes include:

Network problems between microservices. 500s often cascade from upstream services down to the client.
Bugs in software dependencies

Access Logs

Access logs are very helpful when diagnosing issues with ELBs. By default, ELB does not collect access logs, but can be configured to send logs to S3. You can then configure your log management tool (or download the files and use grep) to look for endpoints that are causing problems.

Latency

For Classic load balancers, the Latency metric in CloudWatch measures the time it takes for a registered instance to send response headers after receiving a request from the load balancer. For Application load balancers, the equivalent CloudWatch metric is TargetResponseTime.

An increase in latency can indicate a performance issue with your application. If traffic patterns for your load balancer have not changed significantly, check to see if a downstream service such as a database is experiencing high latency, and propagating that time to your web server. If you have seen an increase in traffic, it is possible that your instances are overloaded and adding capacity to the load balancer may help. For low-traffic load balancers, it is also possible that the average latency is thrown off by a few requests taking a very long time.

Bytes Processed

For Application load balancers, the ProcessedBytes metric measures the total number of bytes going in and out of the load balancer. A change in this metric can be caused by two things:

Request rate has increased/decreased
Request or response sizes have increased/decreased

Anomalies with bytes processed are mostly useful for correlating other issues.

Resources

Elastic Load Balancing Metrics and Dimensions (AWS Documentation)
Access Logs for Your Classic Load Balancer (AWS Documentation)
Access Logs for Your Application Load Balancer (AWS Documentation)