OpenSearch Node Concurrent Recoveries - Setting Too High / Low

Elasticsearch Node Concurrent Recoveries Setting is Too High / Low in OpenSearch

By Opster Team

Updated: Oct 17, 2023

| 2 min read

What it means

What are node concurrent recoveries settings in OpenSearch?

The node concurrent recoveries setting determines the maximum number of shards that can be recovered at once from each node.

Recovering shards requires both disk and network resources, so it is advisable to limit the number of shards that can be recovered from a given node at any one time.

If, on the other hand, the concurrent recoveries setting is too limited and is set too low, the cluster may not be able to recover shards at all, or recovery may be slower than usual. This could create performance issues since the cluster has fewer replicas than planned, or may even leave the index unwritable, with the cluster staying yellow or red for a long period of time.

There are a number of different settings that are similar but have subtle differences:

cluster.routing.allocation.node_concurrent_incoming_recoveries (default 2)

How many concurrent incoming shard recoveries (normally replicas) are allowed to happen on a node.

cluster.routing.allocation.node_concurrent_outgoing_recoveries (default 2)

How many concurrent outgoing shard recoveries are allowed to happen on a node.

cluster.routing.allocation.node_concurrent_recoveries (default 2)

This is a convenience function to simultaneously set both cluster.routing.allocation.node_concurrent_incoming_recoveries and cluster.routing.allocation.node_concurrent_outgoing_recoveries.

cluster.routing.allocation.node_initial_primaries_recoveries (default 4)

This is different from the above because it involves the recovery of a primary node using data from the local disk. Because these operations don’t require networking, a larger number of operations may be carried out in parallel on the same node.

How to resolve it

Check the current cluster settings:

GET _cluster/settings

If necessary, change the concurrent recovery settings. In general the defaults are good values to use.

PUT _cluster/settings
{
  "transient": {
    "cluster.routing.allocation.node_concurrent_recoveries ": 2
  }
}