Amazon API Gateway Metric Anomalies

Amazon API Gateway is an AWS service that makes it possible to create REST APIs that are completely managed in AWS. To monitor API Gateway, CloudWatch exposes several metrics that measure usage and performance. Blue Matador automatically watches these metrics and notifies you of anomalies that could potentially signal problems with your REST APIs.

API Gateway 4xx Responses

A number of 4xx responses from an API is fairly normal, but Blue Matador will watch for anomalies in this metric. When your API responds with more 4xx errors than usual, it could signal major issues in your application. Possible sources of these errors include:

Adding a link to resources not found in the API, or removing resources from the API
Clients making unauthenticated requests to a resource

To troubleshoot anomalous 4xx responses, you should enable CloudWatch Logs by following this guide, and then determine which endpoints are returning 4xx responses.

API Gateway 5xx Responses

Like 4xx responses, a number of 5xx responses is probably normal for your API. When you have an anomalous number of 5xx responses for a sustained amount of time, Blue Matador will alert you to fix the problem. Possible causes of these errors include:

A bug was released to your API
Your endpoint is timing out

To troubleshoot anomalous 5xx responses, you should enable CloudWatch Logs by following this guide, and then determine which endpoints are returning 5xx responses.

API Gateway Request Count

API Gateway counts the number of requests made to your API and makes the metric available through CloudWatch. This metric is the count of all requests, including requests that result in an error response. Because this is a metric that helps determine billing, it’s important to monitor for any major changes. Blue Matador provides anomaly detection on this metric so you can know about potentially expensive fluctuations in request count.

If you are experiencing more requests than usual, the following might be true:

A bug in your application code is causing erroneous requests to a particular endpoint or set of endpoints.
A bug in an endpoint is returning error responses, causing a large number of retries.

The following are some potential causes of less requests than expected:

The application code that calls an endpoint may be malfunctioning and not making requests.
Permissions issues may be keeping your application from being able to call your API.

In each case, it is helpful to enable CloudWatch Logs to help look for the endpoint that is receiving the anomalous number of requests.

API Gateway Request Latency

Latency measures the amount of time from when your API receives a request to when it responds to the request. This metric is important to monitor because you likely have performance requirements for your application, and higher latency often signals bugs.

When debugging latency issues, you should first ensure that your code Is the source of the latency. To do so, you can check the IntegrationLatency and Latency metrics in CloudWatch. IntegrationLatency measures only the time it takes your API endpoint to return a result, while Latency measures the end to end time for the request. If the two metrics are mostly the same, your code is the source of the latency. If IntegrationLatency is much lower than Latency, the latency is coming from AWS, and you will have to wait for AWS to fix the issue.

If the latency is coming from your application, use CloudWatch Logs to find the endpoint that is taking longer to execute. The log format you enable should contain the $context.responseLatency variable so you can view how long the requests took.

API Gateway 4xx Responses

API Gateway 5xx Responses

API Gateway Request Count

API Gateway Request Latency

Resources