Before you begin reading the explanation below, try running the free Elasticsearch Health Check-Up get actionable recommendations that can improve Elasticsearch performance and prevent serious incidents. Just 2 minutes to complete and you can check threadpools, memory, snapshots and many more.
What does it mean?
If the Elasticsearch cluster starts to reject indexing requests, there could be a number of causes. Generally it is an indication that one or more nodes cannot keep up with the volume of indexing /delete /update/ bulk requests, resulting in a queue building up on that node. Once the indexing queue exceeds the index queue maximum size (as defined here: Threadpools) then the node will start to reject the indexing requests.
How to resolve
You should check the state of the thread pool to find out whether the indexing rejections are always occurring on the same node, or are spread across all of the nodes.
- If the rejection is only happening on specific data nodes, then you may have a load balancing or sharding issue. See Loaded Data Nodes – Important Elasticsearch Guide.
- Follow the tips in this guide to optimise indexing: Improve Elasticsearch Indexing Speed with These Tips
- If the rejection is associated with a high CPU, then this is generally the consequence of JVM garbage collection which in turn is caused by configuration or query related issues. For a discussion of JVM garbage collection, see: Heap Size Usage and JVM Garbage Collection in ES – A Detailed Guide.
- Queue rejection associated with high CPU may also be a symptom of memory swapping to disk if that has not been deactivated properly on the node. See: The Bootstrap Memory Lock Setting is Set to False – An Elasticsearch Guide.
- If you have a large number of shards on your cluster, then you may have an issue with oversharding. Please see this guide on Shards Too Small (Oversharding) – A Detailed Guide.
- If you observe queue rejection on a node, but that CPU is NOT saturated, then you may have an issue with disk write speed. Look at your monitoring data and check the number of IOPs per second. This is likely to be the case with rotating disk types, and can also be found with ElasticBlock Storage which has difficulty reaching the IOP speeds required by heavy indexing.
- If your issue is related to re-indexing rather than normal indexing, then please see this guide: Improve your Elasticsearch Reindex Performance with These Tips.