Elasticsearch Shard Allocation is Unbalanced


Elasticsearch Shard Allocation is Unbalanced

Opster Team

Nov 2020


In addition to reading this guide, run the free Elasticsearch Health Check-Up. Get actionable recommendations that can improve performance and prevent incidents (does not require any installation). Among the dozens of checks included are: shards sizes, search errors, thread pools, management queue size, circuit breakers and many more. Join over 700 users who use this free tool.

What it means when shard allocation is unbalanced

Elasticsearch will usually balance the index shards evenly across all active data nodes in the cluster. This is generally a process which happens automatically without any specific user intervention. If this is not happening, it is usually because there are certain settings on the cluster which are preventing shard balancing from occurring as expected. In an extreme case, these settings may result in NO shards being allocated to an individual node.

There are two basic processes which govern how shards are distributed among the Elasticsearch nodes:

  1. Shard allocation, which is an algorithm by which Elasticsearch decides which unallocated shards should go on which nodes, 
  2. Shard rebalancing, which is the process of moving a shard from one node to another.

Shard allocation explained

The shard allocation API is very useful for debugging unbalanced nodes, or when your cluster is yellow or red and you don’t understand why. You can choose any index which you would expect might rebalance to the node in question. The API will explain reasons why the shard is not allocated, or if it is allocated, it will explain the reasons why it is not rebalanced to another node.

GET /_cluster/allocation/explain
{
  "index": "my-index",
  "shard": 0,
  "primary": true
}

At the most basic level, Elasticsearch will apply the following rules:

Shards should be shared out to achieve a similar number of shards on each node.

Replicas will never be allocated on the same node as a primary node.  

Beyond that, there are further considerations in the algorithm by which Elasticsearch will try to spread the shards of a given index across as many different data nodes as it can.  Furthermore, there are a number of user defined settings which can also govern shard allocation and rebalancing which we address below. These settings are often the reason why shards are not being allocated / re-balanced as expected.

How to resolve unbalanced shards

Check to see if any rebalancing is taking place:

GET /_cat/health?v&ts=false

If so, then it is probably best to do nothing, and simply wait for Elasticsearch to rebalance the shards across the cluster as it sees fit.

Check the cluster settings to see if there are any settings which are preventing rebalancing from taking place:

GET _cluster/settings

In particular look for settings such as the following:

cluster.routing.rebalance.enable 

This setting should usually be true, as it is set by default:

PUT /_cluster/settings
{
  "persistent" : {
	"cluster.routing.rebalance.enable" : true
  }
}

cluster.routing.allocation.allow_rebalance 

This setting should usually be true:

PUT /_cluster/settings
{
  "persistent" : {
	"cluster.routing.allocation.allow_rebalance" : true
  }
}

cluster.routing.allocation.awareness.attributes: zone

If there are some attributes set, then these could be preventing shards from being allocated to the node. If this is the case, then rather than deleting this setting, you should check that the attribute in question has indeed been set properly on the node – you would expect to find this in elasticsearch.yml. 

cluster.routing.allocation.include/exclude/require

Any of these settings could affect shard allocation.  

If you have these settings then your cluster will eventually end up with NO shards on the node in question. These settings are usually used because the intention is to decommission a particular node. If this is the case, then once no shards are left on the node, you should decommission the node as soon as possible.

If this setting has been left on your cluster by mistake, you can set the unwanted setting to NULL to remove the unwanted setting and restore the default.

PUT /_cluster/settings
{
  "persistent" : {
	"cluster.routing.allocation.include._ip" : null
  }
}

(Replace the setting with whatever setting you want to remove.)





Improve Elasticsearch Performance

Run The Analysis