Blue Matador monitors your Elasticsearch domains for sustained high CPU usage to help you diagnose performance issues with Elasticsearch. High CPU utilization in Amazon Elasticsearch can severely impact the ability of your Elasticsearch nodes to index and query documents. Occasional spikes or short periods of 100% CPU usage are expected when indexing or querying large amounts of data, but sustained high CPU usage should be investigated.

    High CPU Causes

    Sustained high CPU usage on data nodes can be caused by a variety of issues. Depending on your workload, cluster size, index mappings, and node size, any of the following issues may cause high CPU usage.

    Large or frequent queries or writes: If your application is performing large queries or writes very often, you may need to resize your cluster or nodes to meet the performance requirements needed for that workload.  If you know of a problematic index that gets a lot of traffic, check the index mappings to see if there are optimizations to make. For instance if you have text fields that do not need to be searchable, ensure that the index parameter in the mapping is false.  The type of query you are running can also have a significant impact on performance e.g. a query using function_score may take more CPU than expected.

    JVM Memory Pressure:  It is possible that your nodes have exhausted the allocated heap space and are in a state of nearly-constant garbage collection.  Take steps to reduce the memory footprint of your indices or increase node size to get more memory. Amazon Elasticsearch nodes reserve half of a node’s memory for the Java heap.

    Too many open indices: Elasticsearch indices are actually stored as Lucene indices. An index with multiple shards and replicas will result in more Lucene indices. Depending on cluster size, your cluster might degrade in performance with too many indices. A general rule of thumb is to keep less than 10,000 open shards on a cluster. Having thousands of shards open can have a significant impact on cluster management.  Using dedicated masters can help ensure that CPU Utilization due to cluster management does not affect data nodes. Below are the recommended master node sizes based on the number of data nodes:

    • 5-10 nodes: m3.medium.elasticsearch

    • 10-20 nodes: m4.large.elasticsearch

    • 20-50 nodes: c4.xlarge.elasticsearch

    • 50-100 nodes: c4.2xlarge.elasticsearch