Introducing the first and only metric that tracks how well DevOps teams respond to issues affecting infrastructure health. Proactively address issues to increase your index score so you know your systems are operating at their best.
Be alerted on immediate threats to your infrastructure’s health. Visualize how many alerts you are receiving on your pre-configured graph widget. See the number of active alerts and how many have been resolved in the last 24 hours.
Resolve your proactive warnings to keep healthy infrastructure resources from experiencing stability issues in the future. Warnings are based on more than 30 active system checks that we set up automatically for all your servers. Using growth, decay, seasonality of data, and machine learning, we correlate issues for you so you can focus on proactively fixing them.
See all of Blue Matador's recommendations in one place so you can focus on proactively maintaining infrastructure health, not just reacting to existing issues. Each recommendation has details, history, and troubleshooting tips to help you know exactly what to do to prevent an outage.
You timeline displays all your anomalies, alerts, and warnings over a time period so you can see the sequence of events across your entire infrastructure. As you review anomalies detected in your systems’ data, you’ll gain insight into what issues lead to alerts.
When you receive a new warning or alert, we notify you. How you get notified is highly configurable based on your team's unique workflow. For example, critical issues can go to PagerDuty while everything else is routed to Slack.
Watch actual recommendations arrive for live issues.
Alerts are critical issues you need to address immediately. We’ll route these alerts to your incident management system and on Slack so you are alerted immediately and in the way you are used to receiving them.
Warnings are issues that need to be addressed but are not worth waking up in the middle of the night for. These can acknowledged and addressed on your own schedule by creating a task in your incident management system.
Anomalies are items that are abnormal but don’t represent an immediate threat to your systems’ health. Anomalies are presented alongside Alerts and Warnings — in sequence — in the new Anomaly Timeline. The timeline provides context for Alerts and Warnings, helping you find the root cause of an issue without having to sift through graphs and data.
Digest Emails are daily or weekly summaries of the recent incidents in your infrastructure. They’re a valuable resource to communicate to others the value you provide by maintaining optimal system health.
Troubleshooting Tips explain short- and long-term solutions to more than 30 types of issues to enable you to resolve incidents faster. They’re especially useful to new team members or junior DevOps engineers just learning the ropes of your infrastructure.