Elasticsearch Split Brain Problem Explained

Avoid the Next Incident

Check if your ES issues are caused from misconfigured settings

2-min process

Stop Elasticsearch settings related incidents :  Fix My Settings

Last Update: February 2020

Overview

Elasticsearch is a distributed system and may contain one more node in each cluster. For a cluster to become operational, Elasticsearch needs a quorum of a minimum number of master nodes. By default, every node in Elasticsearch is master eligible. These master nodes are responsible for all the cluster coordination tasks to manage the cluster state. 

When you create a cluster, no matter how many nodes you are configuring, the quorum is by default set to one. That means if a cluster has one master node in the operational state, the cluster can work. But if you are running a production cluster for more than two nodes, you should configure an odd number of dedicated master nodes. Usually, most of the clusters are configured with at least three dedicated master nodes and the quorum – the minimum number of master nodes – is set to two.

Read below to see why it is recommended to configure an odd number of master nodes and why it’s important to set the quorum of minimum master nodes. Setting the quorum of the minimum master nodes is controlled by the following parameters in the elasticsearch.yml file on every node:

discovery.zen.ping.unicast.hosts: ["host1:tcp_port", "host2:tcp_port", "host3:tcp_port"]
discovery.zen.minimum_master_nodes: 2

Note: You need to add only the host address and port of only master eligible nodes under discovery.zen.ping.unicast.hosts setting on all of the nodes, including the master nodes. A common mistake of users is adding the host information of all of the nodes under this setting.

The split-brain problem

At any given time, there is only one master node in the cluster among all the master eligible nodes. Split-brain is a situation when you have more than one master in the cluster.

Let’s take for example a cluster that has two master eligible nodes, M1 and M2, with the quorum of minimum_master_node set to one. The split-brain situation can occur in the cluster if both M1 and M2 are alive and the communication network between M1 and M2 is interrupted. When that occurs, both M1 and M2 consider themselves to be alone in the cluster and both elect themselves as the master. At this point, your cluster will have two master nodes and you have a split-brain situation.

Best practices to avoid the split-brain problem

The split-brain problem can be avoided by setting the minimum number of master nodes using the following formula:

minimum_master_nodes = (N/2)+1

Where N is the total number of master eligible nodes in the cluster. The value of the minimum_master_nodes is set by taking round-off to the nearest integer value. For example, if the total number of master eligible nodes is 3, then the minimum_master_nodes will be set to 2.

Considerations for different Elasticsearch versions

The concept we described in the previous sections is applicable to all the Elasticsearch versions before 7.0. In Elasticsearch version 7.0, the discovery module which is responsible for all these cluster communication settings has gone through a complete revamp and you don’t have to worry much about setting the quorum of the minimum number of master nodes. Elasticsearch now decides by itself which nodes are needed to form a quorum. Both of the settings, discovery.zen.ping.unicast.hosts and discovery.zen.minimum_master_nodes, have been removed from the settings.
discovery.zen.ping.unicast.hosts is renamed to discovery.seed_hosts and a new setting cluster.initial_master_nodes decides the initial set of master eligible nodes while cluster bootstrapping process.


About Opster

Incorporating deep knowledge and broad history of Elasticsearch issues. Opster solution identifies and predicts root causes of Elasticsearch problems, provides recommendations and can automatically perform various actions to manage, troubleshoot and prevent issues

Learn more: Glossary | Blog| Troubleshooting guides

Need help with any Elasticsearch issue ? Contact Opster

Avoid the next incident use our settings check-up : Prevent Issues


Click below to learn how to fix common problems related to these concepts
« Back to Index