Our comprehensive, automated monitoring provides peace of mind

Below are the services, metrics and events currently monitored by Blue Matador. New integrations and monitorable events are being developed every day, so please check back often. If you're looking for a specific monitored service or integration not listed, please get in touch and we will prioritize it.

Our comprehensive, automated monitoring provides peace of mind


Certificate Expiring
Upcoming SSL Certificate Expiration


AWS API Gateway

API Gateway 4xx Responses
Number of 4xx response

API Gateway 5xx Responses
Number of 5xx responses

API Gateway Request Count
Anomalous amount of requests

API Gateway Request Latency
Increase in response latency


AWS Athena

Athena Processed Bytes
Amount of data processed by your queries

Athena Query Time
Increase in average query time in your work group

Athena Failed Queries
Number of throttled topic messages

Athena Canceled Queries
Failures in rule actions

AWS Autoscaling

AWS Autoscaling

AutoScaling Capacity
Number of launched servers is less than number needed 



AWS Event
Upcoming scheduled maintenance events

EC2 Credit Balance
CPU credit usage and credits running low 

EC2 Instance Limit
Approaching limit of instances per region 

EC2 Status Check
Automatic health checks on instance configuration and underlying hardware 

Disk IO
Unexpected changes in disk IOPS 

Disk Latency
Deviations in disk latency 

Network IO
Rates of bytes and packets on the network 

 AWS Elastic Beanstalk

AWS Elastic Beanstalk

Beanstalk Events
Negative events due to environment changes

Beanstalk Health
Automatic health status of your Beanstalk environments 

Beanstalk Latency
Increase in application latency 

Beanstalk Environment Pending
Bootstrapping process stuck for a significant amount of time 

Beanstalk Requests
Inconsistent spike or drop in the ApplicationRequestsTotal metric

AWS Cloudfront

AWS Cloudfront

Cloudfront 4xx
Unhealthy percentage of requests that result in a 4xx response 

Cloudfront 5xx
Unhealthy percentage of responses with 5xx response codes 

Cloudfront Request Count
Anomalous spike in request counts 

Cloudfront Data Transfer
Unexpected increase in data transferred to CloudFront 

Linux / Windows

Linux / Windows

CPU Iowait
I/O wait time – CPU idle time with an outstanding disk I/O operation requested 

CPU Steal
Time spent waiting for a real CPU while hypervisor services another virtual processor 

CPU System
Unhealthy level of system CPU (i.e. processes switched to kernel space) 

Disk Inodes
Running out of disk inodes 

Disk Space
Approaching disk space limitations

High Load
Prolonged or abnormally high normalized load 

Dropped Packets
Unhealthy number of dropped packets 

Network Errors
Network errors over a sustained period of time 

Open Files Ulimit
Unable to open files or network sockets 

Threads Ulimit
Unable to spawn new sysV threads

Server runs out of RAM, the OS will use swap space as memory 

Server Time Drift
Server’s time does not match an authoritative time 

Server Unresponsive
Heartbeat created between server and Blue Matador's agent 

Disk IO
Unexpected changes in disk IOPS 

Disk Latency
Deviations in disk latency 

Network IO
Rates of bytes and packets on the network 

AWS DynamoDB

AWS DynamoDB

DynamoDB Capacity
Risk of throttling due to insufficient provisioned capacity

DynamoDB Errors
User errors (HTTP 4xx status codes) or system errors (HTTP 5xx status codes)

DynamoDB Latency
Amount of time successful requests take 

DynamoDB Throttles
Unusual throttling due to partition capacity 



EBS Burst Balance
Approaching burst balance capacity 

EBS IOPS Consumed
Nearing the limit of volume for IOPS usage 

EBS Queue Length
Anomalies on the average queue length of a volume 

EBS IOPS Throughput
Lower IOPS throughput than expected 

EBS Volume State
EBS volume in Error state 

EBS Volume Status
Volume state becomes Warning or Impaired



ELB 400s
Detection of anomalous 400s 

ELB 500s
Detection of anomalous 500s 

ELB Backend Errors
Lack of connection between the load balancer and the host 

ELB Bytes Processed
Anomalies with number of bytes processed 

ELB Unhealthy Hosts
Fewer available targets that can receive traffic than expected 

ELB Latency
Increase in latency 

ELB No Registered Hosts
No registered instances in load balancer 

ELB Region Limits
Nearing limits per-region limits 

ELB Request Count
Anomalous increase/decrease in request count 

ELB Surge Queue
Unusually high surge queue length



IoT Rules Executed
Number of rules executed

IoT Parse Error
Number of message parse errors

IoT Message Throttled
Number of throttled topic messages

IoT Action Failure
Failures in rule actions

AWS Kinesis

AWS Kinesis

Kinesis Incoming Records
Number of records put to a Kinesis stream 

Kinesis Iterator Age
Throttling due to exceeding limits

Kinesis Throttling
Kinesis getting too far behind

AWS Lambda

AWS Lambda

Lambda Dead Letter Errors
Errors sending the event payload to the dead letter queue 

Lambda Function Duration
Anomaly detection when function duration changes 

Lambda Errors
Lambda function results in an error 

Lambda Invocations
Fluctuations in function invocations 

Lambda Iterator Age
Time between when a record is written to the stream and when Lambda reads it 

Lambda Throttling
Throttling due to concurrency limit 

Lambda Timeout
Average function duration approaching timeout


AWS ElastiCache

ElastiCache CPU
High CPU on a cache node

ElastiCache Swapping
Swapping on a cache node

ElastiCache Evictions
Anomalous evictions

ElastiCache Connections
Number of connection is anomalous

ElastiCache Replication Lag
High replication lag on a cache node



Kubernetes API Health
Health checks of Kubernetes API 

Kubernetes  Backoff Limit
Job failures have reached the backoff limit

Kubernetes Component Statuses
Health status of essential Kubernetes components

Kubernetes Container Restarts
Restarting containers

Kubernetes Creating Load Balancer
Failed to create Load Balancer

Kubernetes DaemonSet Unhealthy
Unhealthy number of DaemonSets

Kubernetes Deleting Load Balancer
Failed to delete Load Balancer

Kubernetes Deployment Unhealthy
Monitoring for pod scheduling and life cycles

Kubernetes Eviction Threshold
Node will evict pods to reclaim resources

Kubernetes Failed PreStop Hook
Pods are failing PreStop hooks

Kubernetes Failed Volume Mount
Failed start due to inability of pod to mount all volumes

Kubernetes Job Failed
Jobs running in cluster fail

Kubernetes Node Conditions
Health of several key node metrics

Kubernetes Node OOM
Node out of memory

Kubernetes Node Pod Capacity
Approaching pod limit for each node

Kubernetes Node Rebooted
Node has been rebooted

Kubernetes Node Resources

Capacity tracking for CPU and memory allocated to each pod

Kubernetes OOM Containers
Container out of memory 

Kubernetes Pod Pending
Pod could not be scheduled

Kubernetes Pod Terminating
Pod stuck in terminating state 

Kubernetes Resolv Conf
Node resolv.conf file contains errors

Kubernetes Service Without Endpoints
Service has no defined endpoints

Kubernetes Unavailable Load Balancer
Load Balancer is unavailable

Kubernetes Updating Load Balancer
Failed to update Load Balancer

Kubernetes Waiting Containers
Pod stuck in pending state



RDS Cluster Status
Health status of Aurora cluster 

RDS Commit Latency
Database query latency (commit) 

RDS Select Latency
Database query latency (select) 

RDS Commit Throughput
Anomalous number of commit operations database is handling 

RDS Select Throughput
Anomalous number of select operations database is handling 

RDS Connections
Nearing maximum connections to the database 

RDS CPU Utilization
Percent of instance's CPU being consumed 

RDS Deadlocks
Transactions hold locks that other transactions require 

Anomalous disk throughput 

RDS Free Memory
Amount of unused memory on a database instance 

RDS Instance Status
Health of RDS database instance 

RDS Network IO
Anomalous network traffic to and from each DB instance 

RDS Replica Lag
How far an Aurora replica’s data is behind the data in the primary instance 

RDS Restore Time
Latest point in time to create copy of database falls behind 

RDS Event
Upcoming scheduled maintenance events 

AWS Route53

AWS Route53

Route53 Domain Expiring
Notifications when domains are 30 days from expiration 

Route53 Health Check
Health of web applications, CloudWatch alarms, or even other health checks 

Route53 Zone NS
Check for the configured NS Record for a hosted zone against the public DNS lookup for the domain 



S3 Cors Changed
Change in CORS configuration 

S3 Policy Changed
Changes in the bucket policy 

S3 Replication Bucket
Bucket is configured to replicate to a bucket that does not exist 

S3 Website Changed
Changes to static website configuration



SES Bounces
Bounce rate compared to acceptable values 

SES Complaints
Complaint rates exceeding acceptable values 

SES Domain
Verification status and DKIM settings for all of your verified SES domains 

SES Quota
Per-second and per-day message sent limits 

SES Rejects
Rate of rejected messages 

SES Sends
Number of sends compared to number of deliveries  



SNS Failed Notifications
Message failed to be sent to subscriber 

SNS Published Messages
Unexpected number of messages were published



SQS Nonempty Dead Letter Queue
Dead letter queue is nonempty resulting in failed messages 

SQS Dead Letter Retention
Measurement of retention period length relative to retention period length

SQS Delay Retention
Configuration of message delay to message retention 

SQS Inflight Messages Limit
Messages received by a consumer but not yet deleted 

SQS FIFO Operations Limit
FIFO queues are nearing the per second limit 

SQS Max Receives
Configuration of maximum receives 

SQS Message Size
Approaching documented limits 

SQS Messages Sent
Anomalies in the number of messages sent to an SQS Queue



ECS Cluster Resource Utilization
Approaching 100% CPU utilization or memory utilization 

ECS Service Resource Utilization
Warning when approaching 100% utilization 

ECS Running Tasks
Fewer running tasks than expected

ECS Task Connectivity
Task connected ECS or not

ECS Task Health
Health of running tasks

ECS Task Stopped
Task unexpectedly stops

ECS Task Pending
Tasks taking a significant amount of time to enter the running state


AWS Elasticsearch

Elasticsearch Storage Used
Approaching disk space limitations

Elasticsearch Writes Blocked
Cluster is blocking writes

Elasticsearch CPU Utilization
High CPU on nodes

Elasticsearch Master CPU Utilization
High CPU on master

Elasticsearch JVM Pressure
High JVM pressure on nodes

Elasticsearch Master JVM Pressure
High JVM pressure on master

Elasticsearch Master Reachability
Master node is unresponsive

Elasticsearch KMS Errors
KMS disabled or deleted

Elasticsearch Node Count
Less nodes than configured

Elasticsearch Health Status
Unhealthy Elasticsearch status

Use cases for Blue Matador

How do I monitor HealthyHost Count in ELB?

ELBs present over 50 options and metrics during setup and operation. HealthyHostCount is just the tip of the iceberg for ensuring ELBs are healthy, yet most organizations stop monitoring at this one metric.

How do I monitor HealthyHostCount in ELB

Blue Matador monitors HealthyHostCount in ELBBlue Matador automatically alerts on HealthyHost Count, 5xx errors, lopsided zones and instances, and more.

Which metrics in RDS need to be monitored?

All of them! In a managed database environment, you run the risk of unknown problems like secondary failures, replica lag, and deadlocks. Monitoring the bare-minimum is a recipe for disaster.

Which metrics in RDS need to be monitored

Blue Matador monitors RDS metricsBlue Matador proactively monitors the cluster status, CloudWatch metrics, and engine statistics of all your RDS instances.

What happens when I run out of CPU Credits on EC2?

CPU Credits are Amazon’s way of handling oversubscribed hardware. When you run out, your access to CPU time will be throttled. 8/10 sysadmins don’t realize that CPU stealing can take place even with full CPU credits.

What happens when I run out of CPU Credits

Blue Matador monitors EC2 CPU CreditsOur tool uses leading indicators, like CPU credit balance and steal time, to give pre-emptive, actionable alerts and warnings.

Why is my Kubernetes pod stuck in Pending?

Kubernetes pods can get stuck in Pending state when the cluster doesn’t have enough resources and when there’s an issue with the specified image.

Why is my Kubernetes pod stuck in Pending

Blue Matador monitors Kubernetes containersBlue Matador catches pods in pending state, broken masters, unhealthy deployments, pod limits on a node, and more.

How do I tell when my SSL certificates expire?

95% of all engineers use a spreadsheet or rely on the CA’s reminder email for SSL certificate expirations. The reminder is a single point of failure that causes catastrophic results to software companies each year. 

How do I tell when my SSL certificates expire

Blue Matador monitors ACM SSL certificatesBlue Matador discovers expiring certificates in AWS ACM for ELBs, ALBs, and Cloudfront.

Why are there no automated alerts in CloudWatch?

CloudWatch is reactive, along with most other monitoring tools. It focuses on root cause analysis, data visualization, and customized monitoring of your resources, not automated alerting.

Why are there no automated alerts in Cloudwatch

Blue Matador automatically monitors your AWS infrastructureBlue Matador is the automated alerting tool you've always wanted.

Get started with alert automation
See how it'll save you time and toil