Our comprehensive, automated monitoring provides peace of mind

Below are the services, metrics and events currently monitored by Blue Matador. New integrations and monitorable events are being developed every day, so please check back often. If you're looking for a specific monitored service or integration not listed, please get in touch and we will prioritize it.

Get in Touch >
Our comprehensive, automated monitoring provides peace of mind
AWS ACM

AWS ACM

Certificate Expiring
Upcoming SSL Certificate Expiration


AWS Autoscaling

AWS Autoscaling

AutoScaling Capacity
Number of launched servers is less than number needed 


AWS EC2

AWS EC2

AWS Event
Upcoming scheduled maintenance events

EC2 Credit Balance
CPU credit usage and credits running low 

EC2 Instance Limit
Approaching limit of instances per region 

EC2 Status Check
Automatic health checks on instance configuration and underlying hardware 

Disk IO
Unexpected changes in disk IOPS 

Disk Latency
Deviations in disk latency 

Network IO
Rates of bytes and packets on the network 


 AWS Elastic Beanstalk

AWS Elastic Beanstalk

Beanstalk Events
Negative events due to environment changes

Beanstalk Health
Automatic health status of your Beanstalk environments 

Beanstalk Latency
Increase in application latency 

Beanstalk Environment Pending
Bootstrapping process stuck for a significant amount of time 

Beanstalk Requests
Inconsistent spike or drop in the ApplicationRequestsTotal metric


AWS Cloudfront

AWS Cloudfront

Cloudfront 4xx
Unhealthy percentage of requests that result in a 4xx response 

Cloudfront 5xx
Unhealthy percentage of responses with 5xx response codes 

Cloudfront Request Count
Anomalous spike in request counts 

Cloudfront Data Transfer
Unexpected increase in data transferred to CloudFront 


Linux / Windows

Linux / Windows

CPU Iowait
I/O wait time – CPU idle time with an outstanding disk I/O operation requested 

CPU Steal
Time spent waiting for a real CPU while hypervisor services another virtual processor 

CPU System
Unhealthy level of system CPU (i.e. processes switched to kernel space) 

Disk Inodes
Running out of disk inodes 

Disk Space
Approaching disk space limitations

High Load
Prolonged or abnormally high normalized load 

Dropped Packets
Unhealthy number of dropped packets 

Network Errors
Network errors over a sustained period of time 

Open Files Ulimit
Unable to open files or network sockets 

Threads Ulimit
Unable to spawn new sysV threads

Swapping
Server runs out of RAM, the OS will use swap space as memory 

Server Time Drift
Server’s time does not match an authoritative time 

Server Unresponsive
Heartbeat created between server and Blue Matador's agent 

Disk IO
Unexpected changes in disk IOPS 

Disk Latency
Deviations in disk latency 

Network IO
Rates of bytes and packets on the network 


AWS DynamoDB

AWS DynamoDB

DynamoDB Capacity
Risk of throttling due to insufficient provisioned capacity

DynamoDB Errors
User errors (HTTP 4xx status codes) or system errors (HTTP 5xx status codes)

DynamoDB Latency
Amount of time successful requests take 

DynamoDB Throttles
Unusual throttling due to partition capacity 


AWS EBS

AWS EBS

EBS Burst Balance
Approaching burst balance capacity 

EBS IOPS Consumed
Nearing the limit of volume for IOPS usage 

EBS Queue Length
Anomalies on the average queue length of a volume 

EBS IOPS Throughput
Lower IOPS throughput than expected 

EBS Volume State
EBS volume in Error state 

EBS Volume Status
Volume state becomes Warning or Impaired


AWS ELB

AWS ELB

ELB 400s
Detection of anomalous 400s 

ELB 500s
Detection of anomalous 500s 

ELB Backend Errors
Lack of connection between the load balancer and the host 

ELB Bytes Processed
Anomalies with number of bytes processed 

ELB Unhealthy Hosts
Fewer available targets that can receive traffic than expected 

ELB Latency
Increase in latency 

ELB No Registered Hosts
No registered instances in load balancer 

ELB Region Limits
Nearing limits per-region limits 

ELB Request Count
Anomalous increase/decrease in request count 

ELB Surge Queue
Unusually high surge queue length


AWS Kinesis

AWS Kinesis

Kinesis Incoming Records
Number of records put to a Kinesis stream 

Kinesis Iterator Age
Throttling due to exceeding limits

Kinesis Throttling
Kinesis getting too far behind


AWS Lambda

AWS Lambda

Lambda Dead Letter Errors
Errors sending the event payload to the dead letter queue 

Lambda Function Duration
Anomaly detection when function duration changes 

Lambda Errors
Lambda function results in an error 

Lambda Invocations
Fluctuations in function invocations 

Lambda Iterator Age
Time between when a record is written to the stream and when Lambda reads it 

Lambda Throttling
Throttling due to concurrency limit 

Lambda Timeout
Average function duration approaching timeout


bluematador-aws-ElastiCache-125w

AWS ElastiCache

ElastiCache CPU
High CPU on a cache node

ElastiCache Swapping
Swapping on a cache node

ElastiCache Evictions
Anomalous evictions

ElastiCache Connections
Number of connection is anomalous

ElastiCache Replication Lag
High replication lag on a cache node

Kubernetes

Kubernetes 

Kubernetes API Health
Health checks of Kubernetes API 

Kubernetes Component Statuses
Health status of essential Kubernetes components 

Kubernetes DaemonSet Unhealthy
Unhealthy number of DaemonSets 

Kubernetes Deployment Unhealthy
Monitoring for pod scheduling and life cycles 

Kubernetes Job Failed
Jobs running in cluster fail 

Kubernetes Node Conditions
Health of several key node metrics 

Kubernetes Node Pod Capacity
Approaching pod limit for each node 

Kubernetes Node Resources
Capacity tracking for CPU and memory allocated to each pod 

Kubernetes Failed Volume Mount
Failed start due to inability of pod to mount all volumes 

Kubernetes OOM Containers
Container out of memory 

Kubernetes Container Restarts
Restarting containers 

Kubernetes Pod Terminating
Pod stuck in terminating state 

Kubernetes Pod Pending
Pod could not be scheduled 

Kubernetes Waiting Containers
Pod stuck in pending state

Kubernetes Service Without Endpoints
Service has no defined endpoints


AWS RDS

AWS RDS

RDS Cluster Status
Health status of Aurora cluster 

RDS Commit Latency
Database query latency (commit) 

RDS Select Latency
Database query latency (select) 

RDS Commit Throughput
Anomalous number of commit operations database is handling 

RDS Select Throughput
Anomalous number of select operations database is handling 

RDS Connections
Nearing maximum connections to the database 

RDS CPU Utilization
Percent of instance's CPU being consumed 

RDS Deadlocks
Transactions hold locks that other transactions require 

RDS Disk IO
Anomalous disk throughput 

RDS Free Memory
Amount of unused memory on a database instance 

RDS Instance Status
Health of RDS database instance 

RDS Network IO
Anomalous network traffic to and from each DB instance 

RDS Replica Lag
How far an Aurora replica’s data is behind the data in the primary instance 

RDS Restore Time
Latest point in time to create copy of database falls behind 

RDS Event
Upcoming scheduled maintenance events 


AWS Route53

AWS Route53

Route53 Domain Expiring
Notifications when domains are 30 days from expiration 

Route53 Health Check
Health of web applications, CloudWatch alarms, or even other health checks 

Route53 Zone NS
Check for the configured NS Record for a hosted zone against the public DNS lookup for the domain 


AWS S3

AWS S3

S3 Cors Changed
Change in CORS configuration 

S3 Policy Changed
Changes in the bucket policy 

S3 Replication Bucket
Bucket is configured to replicate to a bucket that does not exist 

S3 Website Changed
Changes to static website configuration


AWS SES

AWS SES

SES Bounces
Bounce rate compared to acceptable values 

SES Complaints
Complaint rates exceeding acceptable values 

SES Domain
Verification status and DKIM settings for all of your verified SES domains 

SES Quota
Per-second and per-day message sent limits 

SES Rejects
Rate of rejected messages 

SES Sends
Number of sends compared to number of deliveries  


AWS SNS

AWS SNS

SNS Failed Notifications
Message failed to be sent to subscriber 

SNS Published Messages
Unexpected number of messages were published


AWS SQS

AWS SQS

SQS Nonempty Dead Letter Queue
Dead letter queue is nonempty resulting in failed messages 

SQS Dead Letter Retention
Measurement of retention period length relative to retention period length

SQS Delay Retention
Configuration of message delay to message retention 

SQS Inflight Messages Limit
Messages received by a consumer but not yet deleted 

SQS FIFO Operations Limit
FIFO queues are nearing the per second limit 

SQS Max Receives
Configuration of maximum receives 

SQS Message Size
Approaching documented limits 

SQS Messages Sent
Anomalies in the number of messages sent to an SQS Queue


AWS ECS

AWS ECS

ECS Cluster Resource Utilization
Approaching 100% CPU utilization or memory utilization 

ECS Service Resource Utilization
Warning when approaching 100% utilization 

ECS Running Tasks
Fewer running tasks than expected

ECS Task Connectivity
Task connected ECS or not

ECS Task Health
Health of running tasks

ECS Task Stopped
Task unexpectedly stops

ECS Task Pending
Tasks taking a significant amount of time to enter the running state


bluematador-aws-Elasticsearch-125w

AWS Elasticsearch

Elasticsearch Storage Used
Approaching disk space limitations

Elasticsearch Writes Blocked
Cluster is blocking writes

Elasticsearch CPU Utilization
High CPU on nodes

Elasticsearch Master CPU Utilization
High CPU on master

Elasticsearch JVM Pressure
High JVM pressure on nodes

Elasticsearch Master JVM Pressure
High JVM pressure on master

Elasticsearch Master Reachability
Master node is unresponsive

Elasticsearch KMS Errors
KMS disabled or deleted

Elasticsearch Node Count
Less nodes than configured

Elasticsearch Health Status
Unhealthy Elasticsearch status

How do I monitor HealthyHost Count in ELB?


ELBs present over 50 options and metrics during setup and operation. HealthyHostCount is just the tip of the iceberg for ensuring ELBs are healthy, yet most organizations stop monitoring at this one metric.

How do I monitor HealthyHostCount in ELB

Blue Matador monitors HealthyHostCount in ELBBlue Matador automatically alerts on HealthyHost Count, 5xx errors, lopsided zones and instances, and more.

Which metrics in RDS need to be monitored?


All of them! In a managed database environment, you run the risk of unknown problems like secondary failures, replica lag, and deadlocks. Monitoring the bare-minimum is a recipe for disaster.

Which metrics in RDS need to be monitored

Blue Matador monitors RDS metricsBlue Matador proactively monitors the cluster status, CloudWatch metrics, and engine statistics of all your RDS instances.

What happens when I run out of CPU Credits on EC2?


CPU Credits are Amazon’s way of handling oversubscribed hardware. When you run out, your access to CPU time will be throttled. 8/10 sysadmins don’t realize that CPU stealing can take place even with full CPU credits.

What happens when I run out of CPU Credits

Blue Matador monitors EC2 CPU CreditsOur tool uses leading indicators, like CPU credit balance and steal time, to give pre-emptive, actionable alerts and warnings.

Why is my Kubernetes pod stuck in Pending?


Kubernetes pods can get stuck in Pending state when the cluster doesn’t have enough resources and when there’s an issue with the specified image.

Why is my Kubernetes pod stuck in Pending

Blue Matador monitors Kubernetes containersBlue Matador catches pods in pending state, broken masters, unhealthy deployments, pod limits on a node, and more.

How do I tell when my SSL certificates expire?


95% of all engineers use a spreadsheet or rely on the CA’s reminder email for SSL certificate expirations. The reminder is a single point of failure that causes catastrophic results to software companies each year. 

How do I tell when my SSL certificates expire

Blue Matador monitors ACM SSL certificatesBlue Matador discovers expiring certificates in AWS ACM for ELBs, ALBs, and Cloudfront.

Why are there no automated alerts in CloudWatch?


CloudWatch is reactive, along with most other monitoring tools. It focuses on root cause analysis, data visualization, and customized monitoring of your resources, not automated alerting.

Why are there no automated alerts in Cloudwatch

Blue Matador automatically monitors your AWS infrastructureBlue Matador is the automated alerting tool you've always wanted.

START FREE TRIAL
Blue Matador dashboards