Work Groups organize your Athena queries and expose metrics around query run time, data processed, failures, and canceled queries. Blue Matador automatically collects these metrics on Cloudwatch-enabled Work Groups and analyzes them for anomalous behavior. If you have not enabled CloudWatch for your Work Groups, including the default Work Group created by AWS, you will need to do so.
The amount of processed data in Athena is primarily used for billing, but can also be a good indicator of query performance. This metric includes data for both successful and canceled queries, and anomalies in the data could be an indicator of an issue, especially if data usage control limits are in place. An unexpectedly high amount of processed data could result in a costly Athena bill, and can be alleviated by optimizing your queriesto scan less data. Converting data to a columnar format such as Parquet can significantly improve query performance.
Blue Matador monitors the query time of successful queries for anomalies. If a critical piece of your application depends on consistent query times in Athena, a significant increase in query time can result in performance degradation for your application. The primary methods used to control query times are to optimize your queries, cache results where possible in your application, and convert your data to a columnar format if possible.
Failed queries are queries that are unable to be completed by Athena. Failed queries may be the result of a syntax error, or an error calling a function on data in the query. You can easily get specific information on failed queries by viewing the Query History in the web console. Each failed query will have an Error details link in the Action column of the history.
Canceled queries are either the result of a query being explicitly canceled by a user or the API, or automatic cancellation if a query goes over the data usage control limits in place for the Work Group. If queries are canceled due to data usage control, you should investigate which queries in the Query History are canceled and if they can be optimized to stay under the data limit. It may be that the underlying data has changed significantly since the limits were set, and raising limits may help.