In addition to reading this guide, we recommend you run the Elasticsearch Health Check-Up. It will detect issues and improve your Elasticsearch performance by analyzing your shard sizes, threadpools, memory, snapshots, disk watermarks and more.The Elasticsearch Check-Up is free and requires no installation.
To review your node concurrent recoveries settings and easily resolve related issues, we recommend you try AutoOps for OpenSearch. AutoOps will also help you optimize other important settings in OpenSearch to improve performance.
Manage all aspects of your OpenSearch operation with the free Opster Management Console (OMC). The OMC makes it easy to orchestrate and manage OpenSearch in any K8 environment. Using the OMC you can deploy multiple clusters, configure node roles, scale cluster resources, manage certificates and more – all from a single interface, for free.
What it means
The node concurrent recoveries setting determines the maximum number of shards that can be recovered at once from each node. Recovering shards requires both disk and network resources, so it is advisable to limit the number of shards that can be recovered from a given node at any one time.
If, on the other hand, the concurrent recoveries setting is too limited and is set too low, the cluster may not be able to recover shards at all, or recovery may be slower than usual. This could create performance issues since the cluster has fewer replicas than planned, or may even leave the index unwritable, with the cluster staying yellow or red for a long period of time.
There are a number of different settings that are similar but have subtle differences:
cluster.routing.allocation.node_concurrent_incoming_recoveries (default 2)
How many concurrent incoming shard recoveries (normally replicas) are allowed to happen on a node.
cluster.routing.allocation.node_concurrent_outgoing_recoveries (default 2)
How many concurrent outgoing shard recoveries are allowed to happen on a node.
cluster.routing.allocation.node_concurrent_recoveries (default 2)
This is a convenience function to simultaneously set both cluster.routing.allocation.node_concurrent_incoming_recoveries and cluster.routing.allocation.node_concurrent_outgoing_recoveries.
cluster.routing.allocation.node_initial_primaries_recoveries (default 4)
This is different from the above because it involves the recovery of a primary node using data from the local disk. Because these operations don’t require networking, a larger number of operations may be carried out in parallel on the same node.
How to resolve it
Check the current cluster settings:
GET _cluster/settings
If necessary, change the concurrent recovery settings. In general the defaults are good values to use.
PUT _cluster/settings { "transient": { "cluster.routing.allocation.node_concurrent_recoveries ": 2 } }