Elasticsearch Cluster Concurrent Rebalance High / Low

Elasticsearch Cluster Concurrent Rebalance High / Low

Opster Team

March 2021


In addition to reading this guide, we recommend you run the Elasticsearch Health Check-Up. It will detect issues and improve your Elasticsearch performance by analyzing your shard sizes, threadpools, memory, snapshots, disk watermarks and more.

The Elasticsearch Check-Up is free and requires no installation.

Run the Elasticsearch check-up to receive recommendations like this:

checklist Run Check-Up
error

The following configuration error was detected on node 123...

error-img

Description

This error can have a severe impact on your system. It's important to understand that it was caused by...

error-img

Recommendation

In order to resolve this issue and prevent it from occurring again, we recommend that you begin by changing the configuration to...

1

X-PUT curl -H "Content-Type: application/json" [customized recommendation]

What it means

The cluster concurrent rebalance setting determines the maximum number of shards which the cluster can move to rebalance the distribution of disk space requirements across the nodes at any one time.

When moving shards, a shard rebalance is required in order to rebalance the disk usage requirements across the clusters. This rebalance uses cluster resources. Therefore, it’s advisable to reduce the concurrent rebalance setting to limit the number of shards that can be moved, so that the cluster doesn’t use up too many resources moving shards at any one time. The default value for this setting is 2. 

If, on the other hand, the concurrent rebalance setting is too low, the cluster may not be able to rebalance shards at all. This could cause some nodes to be unable to allocate shards due to full disks even if there is space available on other nodes. This could result in the cluster going yellow or red and not being able to write new data to certain indices.

How to resolve it

Check the current cluster settings.

GET _cluster/settings

If necessary, change the concurrent rebalance settings.  Remember that the default value is 2.

PUT _cluster/settings
{
  "transient": {
    "cluster.routing.allocation.cluster_concurrent_rebalance": 2
  }
}


Run the Check-Up to get a customized report like this:

Analyze your cluster