Elasticsearch Setting Up Zone Awareness for Shard Allocation in Elasticsearch

Average Read Time

2 Mins

Elasticsearch Setting Up Zone Awareness for Shard Allocation in Elasticsearch

Opster Team

October 2021

Average Read Time

2 Mins


In addition to reading this guide, we recommend you run the Elasticsearch Health Check-Up. It will detect issues and improve your Elasticsearch performance by analyzing your shard sizes, threadpools, memory, snapshots, disk watermarks and more.

The Elasticsearch Check-Up is free and requires no installation.

To set up zone awareness in Elasticsearch, we recommend you run the Elasticsearch Configuration Check-Up. The Check-Up will also help you optimize other important settings in Elasticsearch to improve performance and ensure high availability for your crucial data.

Run the Elasticsearch check-up to receive recommendations like this:

checklist Run Check-Up
error

Shard allocation is not enabled

error-img

Description

When shard allocation is not enabled, Elasticsearch will not be able to create new indices, recover corrupted indices or rebalance, which is likely to cause data loss...

error-img

Recommendation

Based on your specific Elasticsearch deployment, we recommend that you correct the...

1

X-PUT curl -H "Content-Type: application/json" [customized recommendation]

What is zone awareness and why is it used?

Elasticsearch is a distributed system designed to maintain data availability, even in cases when individual Elasticsearch nodes become unavailable. For this reason, Elasticsearch creates replicas of shards. If one node crashes or becomes unavailable, the replica shard will be promoted to become the primary shard, and a new replica will be created to replace the one that was lost. 

By default, Elasticsearch will ensure that a replica shard is never created on the same node as the primary shard, to avoid the hypothetical situation where BOTH copies of the shard are on the same crashed node, and neither are available.

Zone awareness takes this concept one step further. What happens if instead of a single server going down, we were to lose an entire rack of servers through a networking or power supply issue? If both the primary and replica shards are on two separate servers, but in the same availability zone, we could still lose both our primary and replica shards – our index would be unavailable. 

To avoid this happening, there is a feature called “zone awareness”. This involves defining “availability zones” in our data center, where each “availability zone” is as independent as possible from the others (in terms of power supply, networking, etc). We then tell Elasticsearch which nodes are in which “availability zone”, and request Elasticsearch to spread the primary and replica shards over as many availability zones as possible.  

Many commercial data centers use the concept of “availability zones”. For example, in AWS the letter in the cluster name (a,b,c) will indicate a different availability zone – eu-central-1b, eu-central-1c, etc.

How to set up zone awareness

Put the following lines into elasticsearch.yml for each node in the cluster:

node.attr.availabilityzone: az1
cluster.routing.allocation.awareness.attributes: availabilityzone

Typically you would use 2 or 3 awareness zones so you would have say az1,az2 and az3.

Restart each node. You should then see that replica shards will never be created within the same availability zone as the primary shard.

Points to keep in mind

  • Bear in mind that zone awareness works in combination with other shard allocation rules such as disk usage. If these rules come into conflict, your cluster could turn Yellow because replica shards are unable to be allocated. For example, if you lose a single node, and it is not replaced immediately, it’s possible that the other nodes in the same zone will not have disk capacity to store all of the replicas required in the availability zone. This could lead your cluster to Yellow status.
  • Make sure that you have a balanced number of nodes in each availability zone, or you will end up with an imbalance on the number of shards and volume of data on your nodes.
  • Although it is important to keep availability zones independent, don’t go to the extreme of putting nodes into geographically separate data centers unless they have extremely good connectivity between them. 


Run the Check-Up to get a customized report like this:

Analyze your cluster