In addition to reading this guide, we recommend you run the Elasticsearch Health Check-Up. It will detect issues and improve your Elasticsearch performance by analyzing your shard sizes, threadpools, memory, snapshots, disk watermarks and more.The Elasticsearch Check-Up is free and requires no installation.
Before you begin reading this guide, we recommend you try running the OpenSearch Error Check-Up which analyzes 2 JSON files to detect many configuration errors.
Index queue size is high is another one of the issues that can be prevented and resolved automatically using AutoOps for OpenSearch. AutoOps will also help you optimize other important settings and processes in OpenSearch to improve performance and ensure high availability for your crucial data. Try it for free.
If the OpenSearch cluster starts to reject indexing requests, there could be a number of causes. Generally it is an indication that one or more nodes cannot keep up with the volume of indexing / delete / update / bulk requests, resulting in a queue building up on that node. Once the indexing queue exceeds the index queue maximum size (as defined here: Threadpools) then the node will start to reject the indexing requests.
How to resolve it
You should check the state of the thread pool to find out whether the indexing rejections are always occurring on the same node, or are spread across all of the nodes.
- If the rejection is only happening on specific data nodes, then you may have a load balancing or sharding issue. See Loaded Data Nodes – Important OpenSearch Guide.
- Follow the tips in this guide to optimize indexing: Improve OpenSearch Indexing Speed with These Tips
- If the rejection is associated with a high CPU, then this is generally the consequence of JVM garbage collection which in turn is caused by configuration or query related issues. For a discussion of JVM garbage collection, see: Heap Size Usage and JVM Garbage Collection in ES – A Detailed Guide.
- Queue rejection associated with high CPU may also be a symptom of memory swapping to disk if that has not been deactivated properly on the node. See: The Bootstrap Memory Lock Setting is Set to False – An OpenSearch Guide.
- If you have a large number of shards on your cluster, then you may have an issue with oversharding. Please see this guide on Shards Too Small (Oversharding) – A Detailed Guide.
- If you observe queue rejection on a node, but that CPU is NOT saturated, then you may have an issue with disk write speed. Look at your monitoring data and check the number of IOPs per second. This is likely to be the case with rotating disk types, and can also be found with block storage which has difficulty reaching the IOP speeds required by heavy indexing.
- If your issue is related to re-indexing rather than normal indexing, then please see this guide: Improve your OpenSearch Reindex Performance with These Tips.
We are sorry that this post was not useful for you!
Let us improve this post!
Tell us how we can improve this post?